Shift your thinking: Hadoop isn’t just another relational database

April 16, 2015

Person thinking whether Hadoop is just another relational database - shift your thinking

Most of us in the data management space are familiar with relational databases and their ETL tools. That’s why, as companies implement Hadoop, IT teams sometimes use the same relational database ETL tools that have been adapted for use with Hadoop. They’re comfortable. However, if you really want to get the most out of your Hadoop investment, a bit of a mind shift – away from how you’ve always done things – is in order.

Structural and data processing dissimilarities aside, there are three main points of differentiation to consider:

Structured versus unstructured data

With relational databases, you’re required to pre-define the schema and can only capture data that meets those criteria. In contrast, Hadoop accepts any type of data in any format, including unstructured data like email, multi-media, web pages, presentations and other business documents. That said, you’ll still need to define schema in order to use the data, but you do have the flexibility to do it with third-party tools either when you load it (on write) or when you use it (on read). We advocate that adding schema on write (when possible) will save you a lot of grief on the back end. For more about why, read this earlier post on why schema and metadata matter.

Data quality

Hadoop offers more flexibility as well as presents additional challenges when it comes to data quality. With relational databases, you have a high degree of confidence that your data quality is good, as it can only be captured if it meets the schema criteria. However, with Hadoop, you capture all of the raw data, which may or may not have some “fields” missing.

For example, say you have employee records data and you find that in 20% of the records the date of birth is incomplete or missing. In a relational database, these records would likely be rejected and either you’d go in and resolve the issue or the records wouldn’t be included in the data set. 

In Hadoop, such incomplete records can be stored as-is without any constraint. Where this functionality becomes important is for data users who don’t need the date of birth, but can use other data from the records (e.g., gender and geolocation). With Hadoop, it’s the user that determines the level of data quality that’s acceptable, potentially allowing businesses to derive more value from their data.

Data access and analytics

The ways you can access data are far broader in Hadoop than in SQL-based relational databases. With Hadoop, storage and the access mechanism are separate. This is critical as the Hadoop ecosystem continues to evolve. It means that you don’t have to move your data, even as access methods change. It also means that users can use different access tools on the same data, depending on their needs. As companies look for new ways to slice and dice data, capture new types of data, use larger data sets and employ predictive modeling, new approaches to understanding big data are needed. Some of these include graph analytics, machine learning and other advanced algorithms. This flexibility is part of the reason why there’s so much creativity in the Hadoop space right now. 

Because data is managed and accessed so differently in Hadoop than with traditional relational databases, savvy IT professionals are investigating new approaches when it comes to determining how they’ll implement and manage their Hadoop projects. Specifically, instead of using existing relational database tools adapted to work with Hadoop – and staying within their comfort zone – many are turning to data management tools that were developed expressly for and work more seamlessly with Hadoop. This is where Zaloni plays. Our Bedrock platform was built from the ground up with Hadoop in mind. We’d be happy to tell you more about why this really makes a difference – so {{cta('f434fc68-fda6-4d5e-81d1-f3f58594c2c2')}} .

 

Previous Article
How to get your IT team ready for a Successful Hadoop Implementation
How to get your IT team ready for a Successful Hadoop Implementation

Transitioning to Hadoop isn’t easy and not all use cases are suitable candidates. However, if you’ve determ...

Next Article
Zaloni Gives Back: Cleanliness Drive in India
Zaloni Gives Back: Cleanliness Drive in India

There is an expression: "Cleanliness is next to godliness" and with this in mind India has taken on a count...

×

Get the latest tips and how-to's delivered straight to your inbox!

First Name
Last Name
Zaloni Blog Email Subscription
Thank you!
Error - something went wrong!