How to Conquer your Data Sprawl with AWS

December 10, 2019 Brett Carpenter

Data and analytics success relies on providing data analysts and data scientists with quick, easy access to accurate, quality data. There’s no better solution currently on the market to achieve this than the Zaloni Data Platform (ZDP) paired with AWS.

In a recent project, together with AWS, we helped the TMX Group (a Canadian financial services company that operates equities, fixed income, derivatives, and energy markets exchanges) manage their complex data sprawl into a consolidated and enriched self-service data catalog.

This allowed TMX Group to use their data for such cases as monetizing data for revenue growth and providing 360-degree customer views to improve customer experience and uncover cross-sell and up-sell opportunities.

Architecting for AWS

architecture for S3 data lake managed by Zaloni

When building a data lake on AWS, we recommend a zone-based architectural approach. This helps control how data is moved and processed while also providing governance and security controls through role-based access. This also provides data lineage that shows where data is coming from, where it’s going and what’s happened to it over time.

Understanding the data architecture is one thing, but what about actually deploying a data lake? How can you ensure success?

Data Lake Deployment Best Practices we Learned from TMX Group

1. Connect more data from more sources.

Connecting to a variety of distributed and siloed data sources including cloud and on-prem data, and easily adding these sources to the catalog as they become available is essential to future-proofing your AWS data lake.

2. Catalog data for accurate, trusted, and repeatable use.

To gain insights from your data, you need to know what data you have. A data catalog that focuses on automation with machine learning and artificial intelligence along with detailed and active metadata for easy consumption can help to get you answers fast so you can act accordingly.

3. Govern data for security and traceability.

Data governance through role-based access control is critical for compliance with industry regulations around privacy and security along with masking and tokenization capabilities. With so much attention on protecting customer data, data governance is a must-have for any organization.

4. Provide business users with self-service data access.

What good is a data catalog and data governance without allowing your business users access to the data they need? Granting self-service data access will allow them to see the data they want, when they need it, without needing to request it from IT. That’s a win-win!

Wish this blog was more detailed? This was only a short overview of a much more in-depth version on the AWS blog.

Ready to get started leveraging ZDP on AWS? Learn more, visit the AWS marketplace or request a custom demo today!

About the Author

Brett Carpenter

Brett Carpenter is the Marketing Strategist for Zaloni. When he's not diving into the world of data lakes, creating engaging content, or leading community endeavors, he's either enjoying the great outdoors or exploring the food scene in the Raleigh-Durham area.

Follow on Twitter Follow on Linkedin More Content by Brett Carpenter
Previous Article
Data for Good Requires Good Data
Data for Good Requires Good Data

There is no doubt that companies are adopting more practices focused on doing “good” with their data. Much ...

Next Article
5 Considerations for Augmented Data Management
5 Considerations for Augmented Data Management

Augmented data management (ADM) involves using machine learning (ML) and artificial intelligence (AI) engin...


Get a custom demo for your team.

First Name
Last Name
Phone Number
Job Title
Comments - optional
I would like to subscribe to email updates about content and events
Zaloni is committed to the best experience for you. Read more on our Privacy Policy.
Thank you! We'll be in touch!
Error - something went wrong!