Understanding Metadata

The key to a well-managed and governable data lake is metadata

Organizations looking to harness massive amounts of data are leveraging data lakes, a single repository for storing all the raw data, both structured and unstructured. The key to making a data lake successful is using metadata to provide valuable context through tagging and cataloging.

This practical book examines why metadata is essential for managing, migrating, accessing, and deploying any big data solution. Authors Federico Castanedo and Scott Gidley dive into the specifics of analyzing metadata for keeping track of your data—where it comes from, where it’s located, and how it’s being used—so you can provide safeguards and reduce risk. In the process, you’ll learn about methods for automating metadata capture.

This book also explains the main features of a data lake architecture and discusses the pros and cons of several data lake management solutions that support metadata. These solutions include:

  • Traditional data integration/management vendors such as the IBM Research Accelerated Discovery Lab
  • Tooling from open source projects, including Teradata Kylo and Informatica
  • Startups such as Trifacta and Zaloni that provide best of breed technology

Interested in setting up a data lake for your organization? How about cleaning up your current data lake? Reach out to us and we'll be happy to help!

About the Authors

Scott Gidley is Vice President of Product Management for Zaloni, where he is responsible for the strategy and roadmap of existing and future products within the Zaloni portfolio. Scott is a nearly 20 year veteran of the data management software and services market. Prior to joining Zaloni, Scott served as senior director of product management at SAS and was previously CTO and cofounder of DataFlux Corporation. Scott received his BS in Computer Science from University of Pittsburgh.

Federico Castanedo is the Lead Data Scientist at Vodafone Group in Spain, where he analyzes massive amounts of data using artificial intelligence techniques. Previously, he was Chief Data Scientist and co-founder at WiseAthena.com, a start-up that provides business value through artificial intelligence. For more than a decade, he has been involved in projects related to data analysis in academia and industry. He has published several scientific papers about data fusion techniques, visual sensor networks, and machine learning. He holds a Ph.D. in Artificial Intelligence from the University Carlos III of Madrid and has also been a visiting researcher at Stanford University.

Previous Article
Automated Data Inventory
Automated Data Inventory

There are numerous methods of ingesting files into a data lake and a plethora of point solutions that suppo...

Next Article
Metadata is Critical for Fishing in the Big Data Lake
Metadata is Critical for Fishing in the Big Data Lake

Excerpt from report, Managing the Data Lake: Moving to Big Data Analysis, by Andy Oram, editor at O’Reilly ...


Get a custom demo for your team.

First Name
Last Name
Phone Number
Job Title
Comments - optional
I would like to subscribe to email updates about content and events
Zaloni is committed to the best experience for you. Read more on our Privacy Policy.
Thank you! We'll be in touch!
Error - something went wrong!