Data Lake Maturity Model

Issue link: https://resources.zaloni.com/i/1078782

Contents of this Issue


Page 19 of 43

• Tools must support access by people with modest technical skills. A web interface might allow some queries to be generated by filling out a form, which is then translated into source code and run by the underlying system. Customizable dashboards can also provide crucial information quickly. When self-service becomes universal and everyone in the organiza‐ tion is trained to consult the data before making a decision, the organization has become truly data driven. This is a new way of run‐ ning a business, and we examine the cultural and organizational impacts of this later in this report. Essentially, conversations change from "What do you think?" or "What have you seen like this situa‐ tion before?" to "What does the data tell us?" and "What data backs up your assertion?" Challenges like those keep people focused on the best interests of the company, clients, and stakeholders when proposing or opposing various courses of action. Dierent Zones for Dierent Purposes The previous discussion prepared you for one of the most important facets of the data lake's architecture: zones. Zones satisfy the need to give different users data of different types. If data is expected to be sensitive, it can be loaded into a transient or landing zone so that it can be vetted before it even enters the data lake. Some staff members need access to the raw data, usually to clean and prep it in ways described earlier in the section "Curation: Clean‐ ing, Prepping, and Provenance." The raw data is also good to store in case you want to prep it a different way or check whether cleaning and prepping introduced errors. For this data, the data lake contains a raw zone. However, because user queries would be inaccurate if run on the raw data, most users want their data after it has been cleaned and prep‐ ped. So, after these curation steps, the prepared data is stored in a gold or trusted zone, which serves the rest of the organization as the "single source of truth." To run queries, users can employ programmers, use a growing array of commercial tools, or access web interfaces to create a limited set of new queries. But different users might want data to be repre‐ sented with different schemas, and some users just want aggregated 14 | The Data Lake Maturity Model

Articles in this issue

view archives of eBooks - Data Lake Maturity Model