“Between 2010 and 2020, the amount of data will increase by 20 times, with 77% of the data relevant to organizations being unstructured." ~IDC
As the market's enthusiasm for modern big data architecture continues, the promise of the data lake continues to elude many. Mainstream production implementations in industries as varied as retail, health, and financial services are becoming more and more common. As these organizations realized success, more organizations were lured by the promise of transforming data to deliver real, actionable insight into the business.
To ensure that a data lake implementation is a strategic core component of a modern big data architecture, enterprise organizations must be sure their data lake implementation does two things:
1. Meets enterprise service level agreement for scalability, performance, security and availability.
2. Addresses and manages each stage of the data value chain.
Today it is completely feasible to find and exploit a middle ground: a data lake implementation that is flexible, scalable and cost-effective, while introducing some of the rigor of traditional architectures such as with EDW deployments.
In the paper Realizing the Promise of a Hadoop Data Lake Paul Miller and Kelly Schupp explain how you can realize the promise of a data lake, and how you can prevent it from becoming a data swamp.