The key to a well-managed and governable data lake is metadata
Organizations looking to harness massive amounts of data are leveraging data lakes, a single repository for storing all the raw data, both structured and unstructured. The key to making a data lake successful is using metadata to provide valuable context through tagging and cataloging.
This practical book examines why metadata is essential for managing, migrating, accessing, and deploying any big data solution. Authors Federico Castanedo and Scott Gidley dive into the specifics of analyzing metadata for keeping track of your data—where it comes from, where it’s located, and how it’s being used—so you can provide safeguards and reduce risk. In the process, you’ll learn about methods for automating metadata capture.
This book also explains the main features of a data lake architecture, and discusses the pros and cons of several data lake management solutions that support metadata. These solutions include:
- Traditional data integration/management vendors such as the IBM Research Accelerated Discovery Lab
- Tooling from open source projects, including Teradata Kylo and Informatica
- Startups such as Trifacta and Zaloni that provide best of breed technology
About the Authors
Scott Gidley is Vice President of Product Management for Zaloni, where he is responsible for the strategy and roadmap of existing and future products within the Zaloni portfolio. Scott is a nearly 20 year veteran of the data management software and services market. Prior to joining Zaloni, Scott served as senior director of product management at SAS and was previously CTO and cofounder of DataFlux Corporation. Scott received his BS in Computer Science from University of Pittsburgh.
Federico Castanedo is the Lead Data Scientist at Vodafone Group in Spain, where he analyzes massive amounts of data using artificial intelligence techniques. Previously, he was Chief Data Scientist and co-founder at WiseAthena.com, a start-up that provides business value through artificial intelligence. For more than a decade, he has been involved in projects related to data analysis in academia and industry. He has published several scientific papers about data fusion techniques, visual sensor networks, and machine learning. He holds a Ph.D. in Artificial Intelligence from the University Carlos III of Madrid and has also been a visiting researcher at Stanford University.