Understanding metadata scalable data architecture book

Issue link: https://resources.zaloni.com/i/790575

Contents of this Issue


Page 8 of 22

Building a data lake is not a simple process, and it is necessary to decide which data to ingest, and how to organize and catalog it. Although it is not an automatic process, there are tools and products to simplify the creation and management of a modern data lake architecture at enterprise scale. These tools allow ingestion of differ‐ ent types of data—including streaming, structured, and unstruc‐ tured; they also allow application and cataloging of metadata to provide a better understanding of the data you already ingested or plan to ingest. All of this allows you to create the foundation for an agile data lake platform. For more information about building data lakes, download the free O'Reilly report Architecting Data Lakes. What Is Metadata and Why Is It Critical in Today's Data Environment? Modern data architectures promise the ability to enable access to more and different types of data to an increasing number of data consumers within an organization. Without proper governance, enabled by a strong foundation of metadata, these architectures often show initial promise, but ultimately fail to deliver. Let's take logistics distribution as an analogy to explain metadata, and why it's critical in managing the data in today's business environ‐ ment. When you are shipping one package to an international desti‐ nation, you want to know where in the route the package is located in case something happens with the package delivery. Logistic com‐ panies keep manifests to track the movement of packages and the successful delivery of packages along the shipping process. Metadata provides this same type of visibility into today's data rich environment. Data is moving in and out of companies, as well as within companies. Tracking data changes and detecting any process that causes problems when you are doing data analysis is hard if you don't have information about the data and the data movement pro‐ cess. Today, even the change of a single column in a source table can impact hundreds of reports that use that data—making it extremely important to know beforehand which columns will be affected. Metadata provides information about each dataset, like size, the schema of a database, format, last modified time, access control lists, usage, etc. The use of metadata enables the management of a scala‐ What Is Metadata and Why Is It Critical in Today's Data Environment? | 7

Articles in this issue

Links on this page

view archives of eBooks - Understanding metadata scalable data architecture book