Data Lake Maturity Model

Issue link: https://resources.zaloni.com/i/1078782

Contents of this Issue


Page 24 of 43

Level 1: Ignore Data at the Ignore level The organization preserves whatever information it can as struc‐ tured data, using traditional relational databases or data marts. The organization is operating with limited datasets collected for immedi‐ ate purposes such as sales leads. Much of the data is internally col‐ lected and might even be entered manually. For instance, after a client visit, the company representative might sit down at her com‐ puter and enter data about the client and their visit. This limits the amount of data that a company can feasibly collect. Even the data entered by staff might go unused, especially if it's a plain-text description of an encounter. That counts as unstructured data, which is too loosely organized for automated programs to pro‐ cess. For instance, it's difficult for health clinics to run programs over their records to detect which patients have worsened and are at risk of hospitalization. Part of the big data revolution is the use of natural language processing (NLP) to extract such insights, but it must be used along with other advanced analytics that aren't employed by organizations at the Ignore level. Data is of poor quality, because the company lacks the tools to per‐ form the cleaning described in "Cleaning data" on page 10. Further‐ more, the data is stored in silos. For instance, facilities planners can't get into the sales databases to find out whether sales have grown enough (or shrunk enough) to justify refashioning the corporate building. There is no point in running automated programs over multiple datasets because they differ so much in schema. Some organizations do manage to combine siloed data in an enter‐ prise data warehouse, but it is expensive and difficult to maintain. For instance, to incorporate a new dataset, its data must be shoe‐ horned into the schema used by the data warehouse, or the data warehouse must be upgraded (a complicated and risky undertaking) to accommodate different data. If analytics are employed, they are limited by the narrow range of available data. A final barrier to effective data use at the Ignore level is the relega‐ tion of old data to archives. As we saw in #three_types_of_analytics, data collected over time can be very powerful. But the databases used by organizations at the Ignore level tend to be expensive and run more slowly when storing lots of data. So, the organizations The Maturity Model | 19

Articles in this issue

view archives of eBooks - Data Lake Maturity Model