Data Lake Maturity Model

Issue link: https://resources.zaloni.com/i/1078782

Contents of this Issue


Page 15 of 43

tion (everybody has a cloud nowadays) and differentiate themselves, provide attractive services that make sure everything is set up and working for you. Some of these cloud services collect popular open source tools such as those we looked at earlier, whereas others develop their own tools along similar lines as the open source options. But the onslaught of new tools will likely never end, because new business and technical conditions create the need for new solutions. The particular ways programmers are handling data today, which you have read about in this section, will help you make sense of data lakes, which we examine in the next section. User Needs and Data Lake Architectures We have seen how data is stored in a data lake and how it is pro‐ cessed. It's important, however, to provide many different forms of data and to control access to authorized people. Curation and archi‐ tectures are the subject of this section. Curation: Cleaning, Prepping, and Provenance As it pertains to the IT industry, curation means putting things in a suitable order for use. In terms of datasets, it covers a number of tasks in cleaning, prepping, and preserving the provenance (also known as lineage) of data. Cleaning data Want to glean insights from social media? Just skim a stream of postings and you can see right away what a data mess it is. The sad truth be told, most data from any source is a mess. Typical problems with data of various types include the following: • Customer datasets with old names, old addresses and phone numbers, purchases made by family members, and so on • Databases with data entered into the wrong field (such as the street number where the phone number should be), empty fields, and other confusions that result when structured data interfaces with human behavior • Sensor data with spikes caused by measurement errors and missing measurements 10 | The Data Lake Maturity Model

Articles in this issue

view archives of eBooks - Data Lake Maturity Model