The jumping-off point for Hadoop is a single proof of concept: start small and show business value for a specific use case using easily accessible data sets. Then do it again. And again. And then, after you’ve successfully deployed your first few use cases into production, it’s easier to gain support from business partners for larger projects. Your business partners want to be sure they’ll get a return on the investment they’re making in Hadoop.
Implementing Hadoop one small step at a time isn’t glamorous, but it’s the most sustainable and successful way to build your data lake – or as Gartner analyst Merv Adrian puts it, your data reservoir. Think of it as an ever-evolving journey, really without end.
I have a data lake, now what?
Once your data is in the data lake, the journey continues. Adrian’s concept of the data reservoir is a useful one to illustrate why. Unlike a lake, a reservoir has a dam. The dam controls when and how much water flows out of the reservoir. In his keynote presentation at the 2014 Hadoop Summit, Adrian offered this description: “A reservoir contains water that is managed, transformed, filtered, secured (somewhat), portable, potable (fit for consumption).” Applying data management tools to your Hadoop data lake is sort of like building a data dam to control the data and make it consumable for end users.
Data quality: your work is never done
Even unstructured data needs to be qualified so that you know what you have and can find data and use it. Applying metadata to your data in Hadoop is absolutely critical. Only once data has been analyzed, verified and prepared can your business realistically use it for analytics. Although qualifying your data is a never-ending process, with a good Hadoop data management platform, you can operationalize and automate a lot of it to make it more efficient.
Experts at the Hadoop Summit agreed that Hadoop is on the brink of going mainstream. Adrian predicted that the future of Hadoop is for enterprises to more effectively exploit it, saying, “The synergy that Hadoop brings to the table is most powerfully magnified when it’s connected to what’s already there – that’s what’s next.” And to be able to connect it, your Hadoop data lake must be tamed with a data governance strategy. This requires – yes, you guessed it – a robust Hadoop data management solution.
If you’ve successfully implemented some Hadoop pilot projects, now’s the time to consider where the journey goes from here. As you plan for deployment, be sure to simultaneously develop a strategy for data management that will enable your Hadoop deployment to scale and your business to realize its return on investment.
About the Author
Ben Sharma, is CEO and co-founder of Zaloni. He is a passionate technologist with experience in business development, solutions architecture, and service delivery of big data, analytics and enterprise infrastructure solutions. Having previously worked in management positions for NetApp, Fujitsu and others, Ben’s expertise ranges from business development to production deployment in a wide array of technologies including Hadoop, HBase, databases, virtualization and storage. Ben is the co-author of Java in Telecommunications and holds two patents. He received his MS in Computer Science from the University of Texas at Dallas.Follow on Twitter More Content by Ben Sharma