R&D Considerations for Developing a Big Data Product

Big data — it’s all the rage in the technology industry right now, and for good reason. Data holds information that, if analyzed properly, reveal invaluable insights that you can use for just about anything — from streamlining internal processes to delivering better health care. But given the potential complexity of its use cases, big data products take a lot of work and planning to execute properly.

LogRhythm has first-hand experience with big data products, given our development of CloudAI. LogRhythm CloudAI combines a wide array of behavioral models with artificial intelligence (AI) and machine learning (ML) to detect and characterize shifts in how users interact with the IT environment. This helps security teams detect hidden and advanced threats.

Joel Holsteen, LogRhythm’s senior software engineer, spoke about the process of building CloudAI as one example of a big data technology at an earlier Boulder/Denver BigData Meetup. The monthly Meetup educates the community on big data technologies and connects members with relevant companies, and over 70 people attended the event.

Holsteen’s presentation dove into the granular, technical considerations when it came to developing CloudAI. In the future, he’ll share a post here on that subject so even if you weren’t at the event, you can still get a behind the scenes look at the development process.

However, the presentation also brought to light a number of relevant questions LogRhythm considers to provide world-class products to our customers. Many of these apply to anyone developing a big data product. Before we share a post on the actual development process, we thought it was only appropriate to first cover the subjects we consider before we begin building. If you have a product idea brewing, check out the below and think about how they apply to your development process!

Questions We Consider

Business

Before we start actively developing, it is important to run through key business considerations. This will ensure our developers have a strong foundation to build upon. Questions include:

What are the business requirements for the product?

Our users’ needs will impact our product’s business requirements. They may want a product to help achieve something deadline-driven (like meeting compliance standards by a certain date). Furthermore, users may want that product delivered in a certain way, such as in the cloud. By taking the time to understand our users’ wants and needs at the start, we’re better prepared to develop the product when the time is right.

What experience do we already have in-house, and how do we leverage it to create the best product possible?

Even if a team hasn’t built a particular type of product before, that doesn’t necessarily mean it doesn’t have relevant experience to contribute. By assessing the skills of our team first, we have a better understanding of any significant gaps we need to address before development begins.

But even so, big data problems are complex and constantly changing, so it is important to understand that a big data product will always be a work in progress. There will always be ways to improve the product, so while striving for perfection is certainly admirable, we don’t consider it a concrete and final destination.

And on that note, a world-class big data product doesn’t necessarily require building all of the infrastructure from scratch. There are plenty of open-source tools available to build upon. This not only allows engineers to spend their time on areas that truly require dedicated development resources, but it also benefits the larger technical community and can assist others who are trying to address challenges of their own.

Data

It goes without saying that it’s worth investing some time to determine the product’s data needs and how to process everything it ingests. So, after we’ve answered our business questions, we explore the following:

How do we ensure we create a robust product?

Data doesn’t operate as simply as we’d all like it to, so we have to consciously make decisions to ensure the quality of our system. Data is sometimes delayed or sent out of order. Naturally, we want the correctness (i.e., the accuracy) of data to be as high as possible while limiting latency. By establishing requirements for correctness and latency, we avoid making technical decisions down the road that ultimately work against these requirements. Along with this, fault tolerance is key. Lost data hurts everyone, so we take the steps necessary to ensure our systems are redundant.

What type of data processing technique is the best choice for our use case?

There are a number of different types of distributed data processing techniques, each with its own pros and cons. Ultimately, we want to select a method that best aligns with our latency, correctness, and cost requirements.

Production

When working on any big data product development project, collaboration among teams is key. To get our project started off on the right foot, we ask ourselves:

How do we ensure our team is fully aligned to reach a common goal?

Engineers and data scientists each bring unique skills to the table, and it is critical to work together rather than separately so that we can achieve a common goal – building a top-of-the-line product. At LogRhythm, we accomplish this by integrating data scientists into our engineering teams as much as possible. Data scientists work side by side with engineers to work toward our common goal. By reducing information silos and increasing collaboration, we ensure smooth development process.

What else can we do to ensure we build a world-class product?

Quick feedback is critical to quickly iterating and improving a data analytics product. A customer’s perspective is extremely valuable; therefore, we try to incorporate customer feedback as early in the development cycle as possible. After all, our products are meant to serve our customers’ needs, and by addressing them at the start, we can better set them – and all of our other customers – up for success.

Examine Your Process

Product development is challenging enough as it is, and big data only adds to those challenges. But by taking time to address key questions in advance, we can eliminate unnecessary interruptions when it’s finally time to start developing the product.

So make sure to think about how these questions apply to your development process. And if you’re interested in learning more about the technical details Holsteen covered in his presentation, make sure to stay tuned! In addition to covering CloudAI, he’ll also provide some useful resources for further learning.