The jumping-off point for Hadoop is a single proof of concept: start small and show business value for a specific use case using easily accessible data sets. Then do it again. And again. And then, after you’ve successfully deployed your first few use cases into production, it’s easier to gain support from business partners for larger projects. Your business partners want to be sure they’ll get a return on the investment they’re making in Hadoop.
Implementing Hadoop one small step at a time isn’t glamorous, but it’s the most sustainable and successful way to build your data lake – or as Gartner analyst Merv Adrian puts it, your data reservoir. Think of it as an ever-evolving journey, really without end.
I have a data lake, now what?
Once your data is in the data lake, the journey continues. Adrian’s concept of the data reservoir is a useful one to illustrate why. Unlike a lake, a reservoir has a dam. The dam controls when and how much water flows out of the reservoir. In his keynote presentation at the 2014 Hadoop Summit, Adrian offered this description: “A reservoir contains water that is managed, transformed, filtered, secured (somewhat), portable, potable (fit for consumption).” Applying data management tools to your Hadoop data lake is sort of like building a data dam to control the data and make it consumable for end users.
Data quality: your work is never done
Even unstructured data needs to be qualified so that you know what you have and can find data and use it. Applying metadata to your data in Hadoop is absolutely critical. Only once data has been analyzed, verified and prepared can your business realistically use it for analytics. Although qualifying your data is a never-ending process, with a good Hadoop data management platform, you can operationalize and automate a lot of it to make it more efficient.
Experts at the Hadoop Summit agreed that Hadoop is on the brink of going mainstream. Adrian predicted that the future of Hadoop is for enterprises to more effectively exploit it, saying, “The synergy that Hadoop brings to the table is most powerfully magnified when it’s connected to what’s already there – that’s what’s next.” And to be able to connect it, your Hadoop data lake must be tamed with a data governance strategy. This requires – yes, you guessed it – a robust Hadoop data management solution.
If you’ve successfully implemented some Hadoop pilot projects, now’s the time to consider where the journey goes from here. As you plan for deployment, be sure to simultaneously develop a strategy for data management that will enable your Hadoop deployment to scale and your business to realize its return on investment.
Talk with us, and together we can explore what's next in your Hadoop journey.
About the AuthorFollow on Twitter Follow on Linkedin Visit Website More Content by Ben Sharma