Zaloni Zip: Data Quality

November 22, 2016 Adam Diaz


The term “data quality” refers to not only the properties make up good data vs. bad data but also what to do with that data after a decision has been made.

The first step in the process of separating good data from bad data might be as simple as filtering missing values. It might be more complex to make sure a SSN field has a value and follows the correct numerical pattern. We could even implement sets of rules to check multiple columns each with their own properties.

The second step involves the actual use of that data. Once we confirm we have data that passes our quality standards, we can put that into an external Hive table in a specific location in HDFS. Equally, what do we do with bad data? Do we simply delete it? Do we copy it and archive it? The point is there is also a process for what is considered bad data.

In this video, simple examples are used to represent what can be a much more complex process in the Zaloni Data Platform. This includes deciding between good data and bad data then the action performed on that data in both cases.



To explore additional topics related to your data, please see more about the big data ecosystem or learn more about our platform.


About the Author

Adam Diaz

Big data & Hadoop thought-leader

Previous Article
Big Data Maturity Stages: Is Your Data Ready to Be a Product?
Big Data Maturity Stages: Is Your Data Ready to Be a Product?

The idea of turning your business data into a product, also termed “data as a product,” is a known concept ...

Next Article
Tez and LLAP Improvements to Make Hive Faster
Tez and LLAP Improvements to Make Hive Faster

Before the days of Spark, there was a huge Cloudera vs Hortonworks fight over what was to be the SQL/RDBMS ...


Get a custom demo for your team.

First Name
Last Name
Phone Number
Job Title
Comments - optional
I would like to subscribe to email updates about content and events
Zaloni is committed to the best experience for you. Read more on our Privacy Policy.
Thank you! We'll be in touch!
Error - something went wrong!