Data Lake in the Cloud, Hybrid, or On-Premise?

March 26, 2018 Ben Sharma

This article is an excerpt from "Architecting Data Lakes: Second Edition" by Ben Sharma. Get the full ebook today!

In the past, most data lakes resided on-premises. This has undergone a tremendous shift recently, with most companies looking to the cloud to replace or augment their implementations.

Whether to use on-premises or cloud storage and processing is a complicated and important decision point for any organization. The pros and cons to each could fill a book and are highly dependent on the individual implementation. Generally speaking, on-premises storage and processing offers tighter control over data security and data privacy, whereas public cloud systems offer highly scalable and elastic storage and computing resources to meet enterprises’ need for large-scale processing and data storage without having the overheads of provisioning and maintaining expensive infrastructure.

Also, with the rapidly changing tools and technologies in the ecosystem, we have also seen many examples of cloud-based data lakes used as the incubator for dev/test environments to evaluate all the new tools and technologies at a rapid pace before picking the right one to bring into production, whether in the cloud or on-premises.

If you put a robust data management structure in place, one that provides complete metadata management, you can enable any combination of on-premises storage, cloud storage, and multi-cloud storage easily.

Finish reading this chapter and more in your own copy of "Architecting Data Lakes."


Architecting Data Lakes


About the Author

Ben Sharma

Ben Sharma, is CEO and co-founder of Zaloni. He is a passionate technologist with experience in business development, solutions architecture, and service delivery of big data, analytics and enterprise infrastructure solutions. Having previously worked in management positions for NetApp, Fujitsu and others, Ben’s expertise ranges from business development to production deployment in a wide array of technologies including Hadoop, HBase, databases, virtualization and storage. Ben is the co-author of Java in Telecommunications and holds two patents. He received his MS in Computer Science from the University of Texas at Dallas.

Follow on Twitter Follow on Linkedin Visit Website More Content by Ben Sharma
Previous Article
Data Lakes: A Look Ahead
Data Lakes: A Look Ahead

Next Article
Data Lakes in the Cloud - Best of Both Worlds
Data Lakes in the Cloud - Best of Both Worlds

Strategy for a data lake in the cloud is complex with many pathways. Here are four considerations that shou...

Want a governed, self-service data lake?

Contact Us