The Three Techniques for Improving Analytics ROI in the Cloud

Maurice_Lacroix_600x448.jpg

Click to learn more about author Maurice Lacroix.

In an industry as competitive as eCommerce retail, the ability to turn
data into actionable insights presents the opportunity to make business
decisions that drive more revenue and control costs. Collecting and then
analyzing retail data like customer visits, logistic fulfillment, pricing, and
customer satisfaction presents a multitude of challenges that, if successfully
overcome, can be the difference between a good business and a category leader.

It’s my responsibility as the business intelligence product owner my organization to help our business become truly data-driven. Today we are the leading online retailer in The Netherlands and Belgium, with over 11 million customers, 23 million items, and over 40,000 partners selling their products. Our 2,000 employees analyze data growing steadily year over year from over 250 data sources using 3,000 workbooks. It’s my job to make sure that all of that data can be analyzed to provide the insight the business needs to make business decisions.

Starting from one Oracle BI stack and now fully deployed in the cloud, I’ve been a part of a central BI team that has learned a lot about how to support the voracious appetite of the business analyst. Over our years of growth and evolution, we’ve identified three critical focal points that every business should consider when satiating the business’s thirst for data: the right technology, monitoring usage, and continuous improvement.

The Right Technology

Enterprise
companies face challenges of overcoming the limitations of existing legacy
technology and providing the performance necessary to drill into data at scale.
The full analytics stack relies on three components: a data warehouse that can
support the capacity demands of the business, a modeling platform to provide
consistent data definitions analysts can use to drill into data, and a
visualization tool to derive the insights that are ultimately used to make business
decisions.

The first step in
choosing the right technology is to establish the goals of your organization.
What are the business outcomes that you’re trying to achieve? As an example, we
wanted our organization to be data-driven at scale. Our 2,000 colleagues had to
be able to do drill-down analysis on a rapidly growing data volume without
having to overly rely on IT.

With your goals
established, it’s important to define your technology evaluation criteria.

We settled on
three criteria that we felt would drive performance and, ultimately, our
business goals. These are: the capacity of the platform, usage of the platform,
and the compute cost of the dashboard or data model.

Our evaluation landed us with Google BigQuery as our cloud data warehouse, AtScale for our Data Modeling and semantic layer, and Tableau for visualization. The results are, our team now generates over 200,000 workbook requests across 3,000 workbooks being modeled through 100 virtual cubes from 250 data sources.

Monitoring Technology and Usage

Adopting the
right cloud technology offers a tremendous opportunity for both cost savings
and performance scalability. However, if the technology is used without
oversight, there is a very good chance that performance expectations will not
be met, and unpredictable costs will erase any of the cloud’s value. This is
why it’s incredibly important to implement a monitoring framework to get (and
keep) your BI stack in shape.

Performance
bottlenecks occur when resources exceed thresholds at peak loads, and user
concurrency results in queuing. To identify resource utilization bottlenecks,
we’ve set up very detailed, real-time monitoring of our systems. Metrics we
track include CPU, memory, disk I/O, network traffic, and query response times.

In our experience,
the most common bottleneck is user request queues. We’ve found that this can be
overcome with small configuration changes in the data platform. In cases where
the tuning of the existing environment isn’t enough, the next option is to
scale horizontally with more machines or vertically with more powerful
machines. This is always the second option, though, as scaling machines is
never free!

Without this
depth of monitoring, costs can quickly get out of control. In our case, we have
to optimize for Google’s costs. Google offers two pricing options for
processing data through BigQuery. The first is on-demand pricing, which allows
a customer to pay as they go based on the amount of data processed. The second
is flat-rate pricing, where there is a fixed fee for guaranteed processing
capacity.

When we first
adopted the Google Cloud Platform, we thought the on-demand option was the best
fit for us. After seeing the bill over our first three months, we realized we
needed to shift to the flat rate. With monitoring in place, we quickly
understood how our users were querying data and found that we could support the
business with fixed capacity for most of the week and pay for flex capacity
during times where processing demand would increase. For example, Monday mornings
tend to be when the business wants to update their sales reports from the
previous week, which creates extra demand for processing power.

Continuously Improve Your Environment

With the right technology and the proper monitoring in place, it’s time to improve the outcomes of the investment. Improvement is a never-ending process. There are a number of initiatives that can make a world of difference for performance, such as adjusting filter settings in a dashboard, updating a data model, improving data preparation, and code rewriting. The answers for where to focus are in the logs.

The logs are a
record of what users are experiencing and the impact those experiences have on
a technical environment. To improve return on investment, it’s important to map
the logs to the drivers of performance and cost. In our case, it’s optimizing
Google BigQuery’s compute costs, which are measured in slot time. As we improve
slot time, our query performance increases, and our cost per query improves.

The easiest way
to interpret logs is by visualization. We export all of our logs, load them
into Google BigQuery, and query the logs for analysis. That analysis is
visualized in meaningful depictions like box plots and scatter plots to help
identify areas of improvement. Be careful about using averages as they don’t
provide a good depiction of performance.

Some of the most
effective dashboards we run against are logs that evaluate query execution
times against each of our virtual data cubes and the cost of a cube as it
relates to compute usage. By better evaluating user logs, the ability to make
improvements on compute costs and execution times will improve dramatically.

Putting It All Together

Every company has more data than they know what to do with. The issue is that most companies don’t know how to use it. Establish a strategy to choose the right technology for your business, monitor that technology to make sure your business is realizing the value of the investment, and then improve upon that technology by understanding how it’s being used by your team. You can learn even more about this strategy from my in-depth webinar on how to increase your cloud analytics ROI. When you’re able to institute these three techniques, you’ll take your business from being data conscientious to data-driven.

Credit: Source link