Click to learn more about author Haoyuan Li.
The present global crisis has bolstered cries
for cost-cutting across organizations. Data lakes spanning from on-premises
environments to a public cloud platform have continued to evolve, frequently
striving to keep the infrastructure and operational costs low while providing
At many large organizations, traditional data
lakes were established on-premises with complex workflows in place spanning
different business units. The on-premises infrastructure in these environments
is oftentimes stressed, leading to an increase in the total cost of
infrastructure. At the same time, new and unanticipated workloads are rapidly
being onboarded in these data-driven organizations.
A completely on-premises infrastructure will
fail to keep up because of the time it takes to provision new infrastructure and
the heavy operational cost to maintain every piece of hardware and software
acquired. Even though the promise of a public cloud vendor with a managed
elastic infrastructure sounds great, the costs quickly start to add up as we
scale up in this scenario.
As the amount of data continues to grow, the
key to keeping costs under control is to remain flexible. By being prepared for
pieces of infrastructure to be spread across an on-premises data lake and a
public cloud, you will be able to get the best of both worlds. But it’s easier
said than done. Here are five recommendations for how to leverage a hybrid
A complete lift of the on-premises environment
and migration to a public cloud may sound scary. And oftentimes, the benefits
of an elastic compute infrastructure are outweighed by expensive storage,
network, and operation costs. Keep a foot in both worlds by migrating some of
your workloads from a busy on-premises data lake to leveraging compute in the
cloud. Be prepared for data and compute infrastructure to be spread across the
on-premises environment and public clouds.
Adopting cloud-native computing in a naive
manner may necessitate application rewrites, copying of data to cloud storage,
and redefinition of structured data catalogs. Such a complex migration is
laborious and expensive. Abstraction is the key to remain agnostic to the
infrastructure provider at all layers of the technology stack. Container
orchestration future-proofs the application layer so that workloads can be
migrated across infrastructure providers when needed. But data has gravity, and moving data is
not immune to network and storage costs. Similarly, a data orchestration
layer decouples applications from the physical location of data to optimize for
resides where and for how long.
Forget About Data Locality
A key premise for the initial data analytics
ecosystem was that data locality brings you performance. In the scenario where
compute workloads are migrated to a cloud and separated from storage, there is
no locality. Performance gains in the context of a public cloud translate to
cost savings achieved by elastically scaling down compute, when not needed. A
highly distributed caching capability automatically orchestrates hot data to be
closer to compute for performance while keeping cold data in cheaper storage.
Caching also eliminates repeated network transfers and the associated cloud
Policies for Everything (as Much as Possible)
Each workload is unique with different resource usage patterns. Elasticity in the cloud demands policies for those specific workloads, both for compute and storage. Use auto-scaling policies to control when and for how long to keep compute resources up. Similarly, employ Data Management policies to determine which data is migrated when and where to truly enable a hybrid cloud environment.
While open-source software and abstraction are
important and can help avoid potential vendor lock-in, it is also critical to
provide the necessary security features in your environment to protect data in
the organization. Plan security for data in motion and at rest with secure
access methods over a wide area network. Integrate with both on-premises and
cloud components to prevent stumbling on this final hurdle.
Is your organization planning for future growth? Is a
hybrid cloud on the horizon? Does a hybrid cloud make sense, but are you still
skeptical? Choose the right tools for your workloads and re-envision your cloud
Credit: Source link