Understanding metadata scalable data architecture book

Issue link: https://resources.zaloni.com/i/790575

Contents of this Issue


Page 20 of 22

using the data, when the data changed, and why and how the data is changing. Conclusion Since most of the time spent on data analysis projects is related with identifying, cleansing, and integrating data and is magnified when data is stored across many silos, the investment in building a data lake is worthwhile. With a data lake you can significantly reduce the effort of finding datasets, the need to prepare them in order to make them ready to analyze, and the need to regularly refresh them to keep them up-to-date. Developing next-generation data architectures is a difficult task because it is necessary to take into account the format, protocol, and standards of the input data, and the veracity and validity of the information must be ensured while security constraints and privacy are considered. Sometimes it is very difficult to build all the required phases of a data lake from scratch, and most of the time it is something that must be performed in phases. In a next-generation data architecture, the focus shifts over time from data ingestion to transformation, and then to analytics. As more consumers across an organization want to access and uti‐ lize data for various business needs, and enterprises in regulated industries are looking for ways to enable that in a controlled fashion, metadata as an integral part of any big data strategy is starting to get the attention it deserves. Due to the democratization of data that a data lake provides, ample value can be obtained from the way that data is used and enriched, with metadata information providing a way to share discoveries with peers and other domain experts. But data governance in the data lake is key. Data lakes must be architected properly to leverage metadata and integrate with existing metadata tools, otherwise it will create a hole in organizations' data governance processes because how data is used, transformed, and related outside the data lake can be lost. An incorrect metadata architecture can often prevent data lakes from making the transition from an analytical sandbox to an enterprise data platform. Building next-generation data architectures requires effective meta‐ data management capabilities in order to operationalize the data Conclusion | 19

Articles in this issue

view archives of eBooks - Understanding metadata scalable data architecture book