Understanding metadata scalable data architecture book

Issue link: https://resources.zaloni.com/i/790575

Contents of this Issue


Page 14 of 22

rity and privacy rules. Metadata discovery upon ingest can often identify PII or sensitive data for masking. Next, after applying meta‐ data you have the flexibility to create other useful zones: • Rened Zone: Based on the metadata and the structure of the data, you may want to take some of the raw datasets and trans‐ form them into refined datasets that you may need for various use cases. You also can define some new structures for your common data models and do some data cleansing or validation using metadata. • Trusted Zone: If needed, you could create some master datasets and store them in what is called "the Trusted Data Zone" area of the data lake. These master data sets may include frequently accessed reference data libraries, allowable lists of values, or product or state codes. These datasets are often combined with refined datasets to create analytic data sets that are available for consumption. • Sandbox: An area for your data scientists or your business ana‐ lysts to play with the data and again, leverage metadata to more quickly know how fresh the datasets are, assess the quality of the data, etc., in order to build more efficient analytical models on top of the data lake. Finally, on the righthand side of the sample architecture, you have the Consumption Zone. This zone provides access to the widest range of users within an organization. Data Lake Management Solutions For organizations considering a data lake, there are big data tools, data management platforms, and industry-specific solutions avail‐ able to help meet overall data governance and data management requirements. Organizations that are early adopters or heavily IT- driven may consider building a data lake by stitching together the plethora of tooling available in the big data ecosystem. This approach allows for maximum flexibility, but incurs higher mainte‐ nance costs as the use cases and ecosystem change. Another approach is to leverage existing data management solutions that are in place, and augment them with solutions for metadata, self-service data preparation, and other areas of need. A third option is to implement an end-to-end data management platform that is built natively for the big data ecosystem. A Modern Data Architecture—What It Looks Like | 13

Articles in this issue

view archives of eBooks - Understanding metadata scalable data architecture book