Data Lake in the Cloud, Hybrid, or On-Premise?

March 26, 2018 Ben Sharma

This article is an excerpt from "Architecting Data Lakes: Second Edition" by Ben Sharma. Get the full ebook today!

In the past, most data lakes resided on-premises. This has undergone a tremendous shift recently, with most companies looking to the cloud to replace or augment their implementations.

Whether to use on-premises or cloud storage and processing is a complicated and important decision point for any organization. The pros and cons to each could fill a book and are highly dependent on the individual implementation. Generally speaking, on-premises storage and processing offers tighter control over data security and data privacy, whereas public cloud systems offer highly scalable and elastic storage and computing resources to meet enterprises’ need for large-scale processing and data storage without having the overheads of provisioning and maintaining expensive infrastructure.

Also, with the rapidly changing tools and technologies in the ecosystem, we have also seen many examples of cloud-based data lakes used as the incubator for dev/test environments to evaluate all the new tools and technologies at a rapid pace before picking the right one to bring into production, whether in the cloud or on-premises.

If you put a robust data management structure in place, one that provides complete metadata management, you can enable any combination of on-premises storage, cloud storage, and multi-cloud storage easily.

Finish reading this chapter and more in your own copy of "Architecting Data Lakes."

 

Architecting Data Lakes

 

About the Author

Ben Sharma

Ben Sharma, is CEO and co-founder of Zaloni. He is a passionate technologist with experience in business development, solutions architecture, and service delivery of big data, analytics and enterprise infrastructure solutions. Having previously worked in management positions for NetApp, Fujitsu and others, Ben’s expertise ranges from business development to production deployment in a wide array of technologies including Hadoop, HBase, databases, virtualization and storage. Ben is the co-author of Java in Telecommunications and holds two patents. He received his MS in Computer Science from the University of Texas at Dallas.

Follow on Twitter Follow on Linkedin Visit Website More Content by Ben Sharma
Previous Article
How to Build Your Own Sqoop Plugin
How to Build Your Own Sqoop Plugin

Apache Sqoop is a tool designed for efficiently transferring bulk data between Hadoop and structured data s...

Next Article
Data Lakes – Build Your Future-Proof Technology Stack
Data Lakes – Build Your Future-Proof Technology Stack

When selecting your data lake technology stack, it is important to choose technologies that are scalable, e...

Want a governed, self-service data lake?

Contact Us