4 Top Data Governance Observations from DGIQ 2017

June 26, 2017 Greg Wood

Zaloni was at the Data Governance and Information Quality (DGIQ) Conference in San Diego this month, and speaking to attendees revealed some fairly clear patterns and insights. We’ve chosen several data governance and management observations to touch on now that we’re back from the conference.

Governance for all industries, not just the chosen few

Sure, we talked to the usual culprits of data governance; there were plenty of financial, healthcare, and pharmaceutical companies at DGIQ. But more than ever, we’re seeing non-traditional players step into the world of governance. Every vertical was represented in some fashion; we talked to video streaming platforms, local state governments, educational learning companies, and device manufacturers...among others.

More than just new rules or regulations, this signals a broader shift in the way companies think of their data. As the trend towards data monetization continues, it is becoming more imperative to fully understand, protect, and track assets at all stages. More industries will continue to follow this trend as strong data policies become less of a differentiator and more a matter of competitive table stakes.

Passive governance isn’t governance

At one time, governance of data meant tracking sources on an Excel spreadsheet. This type of data management is no longer feasible for an array of reasons, not least of which is the increasing complexity, variety, and operational importance of data in most organizations. Add on top of that increasingly strict and demanding regulations such as GDPR and RDARR, and there’s no question that advanced tools are necessary to keep up.

Active data management is the norm, and that means defining a strategy, choosing appropriate tools, and building out a team to implement the necessary processes. Many tools can make these processes easier, but given the high stakes, we still haven’t seen a foolproof magic bullet. Eventually, we may reach a stage where governance can be completely automated by AI, but governance today still means having dedicated resources.

Data lineage is no longer just a “nice-to-have” 

By far, the #1 request we saw was for an effective data lineage solution. This makes sense given the architecture of most modern data systems - the complexity and interconnectedness of these systems make tracing data from input to output nearly impossible without some sort of automation.

Once PII or sensitive data is introduced into the system, being able to tell where this data is going to and coming from, no matter the originating or destination system, could mean the difference between a multi-billion-dollar fine and a clean audit. That’s not even mentioning the operational efficiency and overall technical capabilities made possible by a good lineage graph.

Shameless plug #1: this is where a data lake can be a truly powerful governance tool; because data lakes maintain data from raw state all the way through to consumption, lineage can be traced natively, much more easily than in traditional architectures.

There are more choices than ever*

(*sort of)

It’s no secret that Hadoop is famous (or infamous) for the sheer number of new tools that come out on a seemingly daily basis. Given that fact, it’s also no surprise that there have been several attempts at filling the gap of data governance on Hadoop by building new tools.

Many of these are well-known, such as Atlas and Navigator, but for most organizations, leave something to be desired. Point products exist to fill most of those gaps, but integration can be a long and difficult process, and when new tools or processes emerge, might require complete retooling. And many traditional EDW systems have their own versions of all of these tools, further muddying the water for anyone looking to start or evolve their data policy.

All of this is to say, although choice is a good thing, and technology continues to make important leaps, there are still plenty of gaps to be filled and challenges to overcome when it comes to data governance technology. Data lakes are especially well-suited to fill many of these gaps (shameless plug #2), but no matter what, a well-planned, carefully implemented strategy is an absolute must.

These are just a few of the takeaways from DGIQ. Data governance is a huge, important, complex topic, but absolutely one worth taking the effort to consider, and hopefully, I have provided some broad context to start that thought exercise.

Learn more:

The Four Zones of Data Lake Architecture                           How Big Data is Powering Next Generation Loyalty Programs and Increasing Customer Satisfaction

Previous Article
Partitioning in Hive
Partitioning in Hive

The concept of partitioning in Hive can make a huge difference in the execution time of large datasets. Her...

Next Article
3 Hacks to Get the Most From Sqoop
3 Hacks to Get the Most From Sqoop

Sqoop is a very effective tool in transferring huge amounts of bulk data between RDBMS and Hadoop. However,...


Get a custom demo for your team.

First Name
Last Name
Phone Number
Job Title
Comments - optional
I would like to subscribe to email updates about content and events
Zaloni is committed to the best experience for you. Read more on our Privacy Policy.
Thank you! We'll be in touch!
Error - something went wrong!