Data lakes aren’t a novel idea. Chances are your organization has created a simple data lake at one point or another. Data is ingested by one department, no one else has access to it, the data languishes, and it’s eventually abandoned. Should the blame rest on the shoulders of the data lake? Here’s how you can turn that lake around. (Spoiler alert: it doesn’t require starting over from scratch).
Check your original objectives for the data lake
There are many reasons why your data lake might not be meeting certain expectations. A few problems you might have experienced are: lack of adoption, no way to monitor the data, or no clear organization and likewise, ease of access to the data.
Notice that each of these problems has a common denominator. It’s not a problem with the data lake itself, the issue lies within the applied data management and governance. To turn around the perception of a failed lake, you must apply the proper architecture and implement a management platform.
Assess your current architecture and strategy
Let’s dive into the architecture first. Building a standard data lake can be done in an afternoon. To build one properly takes some planning and design. By implementing a zone-based approach to your design, you can insure that your data lake scales with your organization.
The following are aspects of your data lake design that your final architecture should include:
- Data ingestion from multiple sources
- Keeping original source data to provide a single source of truth
- Ensuring role-based access
- Standardizing data for single versions of truth
- Providing a sandbox to allow for non-production manipulation of data
Don’t work alone. Make a data platform work for you
Once you’ve finished building a data lake, you’re done, right? Not so fast. Without a system in place, you can run into the same issues you had before. Enter a data management platform. Most platforms will sit on top of your data lake to monitor and control data from the moment it’s ingested through to the end user.
The best platforms can optimize your current data without the need to reingest, thus saving time that can be spent elsewhere, and provide a self-service catalog that brings governed transparency to your lake. Like the architecture, your platform needs to be properly implemented to support your organization through its digital transformation and beyond.
Instead of settling for the current state of your data lake, turn it around using a metadata-based approach to make your data lake a successful part of your ecosystem. Your data lake can still be scalable and future-proof (without the need to start over). Even if you don’t have a data lake yet, you can benefit from working with us from the onset to ensure your lake scales with your growth.
About the AuthorMore Content by Brett Carpenter