Data ingestion is more than simply dumping your data into a lake. In fact, that’s how you’ll end up with a data swamp. You also need to capture metadata and catalog that data upon ingestion to create a healthy data ecosystem. You need to be able to direct the data to the right place as it is ingested and moves along its journey. That’s why it is so important to have a data ingestion and management platform in place to monitor this process and ensure there are no bumps in the road.
Build vs. Buy
One of the first decisions you have to make regarding your data ingestion and management is whether or not to build or to buy the platform. You need to consider the advantages of buying off the shelf or developing what you need yourself. There is no doubt that many IT teams are able to write the code and design their own software, however, you have to ask if it’s worth it. Building your own system distracts from the core business and creates extra costs over the long term. Instead, purchasing a platform lowers your IT costs for hardware, software, and maintenance.
“When you embark on a big data journey,” says Dirk Jungnickel, Senior Vice President of Business Analytics at du Telecom, “you need something that allows you to build robust and reliable data pipelines, manage the underlying metadata, and continuously process large amounts of data. It was clear from the start that we wanted to buy a data management and ingestion solution.”
Automating Data Ingestion
Ingesting large amounts of data into a data lake is a challenge, in part because data lakes are so complex. In fact, on average, a data lake can consist of more than 20 different technologies, using different languages, managing different data types, and supporting various business purposes. It serves you well to take steps - e.g. implement technology that reduces complexity by providing already-integrated components and automating processes as much as possible.
Automating your data ingestion is a key factor in getting the most out of your data lake. Preparing data for ingestion is a highly manual, resource-intensive process that can significantly lower your ROI and add months to your production timeline if you fail to implement a data ingestion solution.
Dirk Jungnickel continues, “the amount of data we ingest is in the range of many terabytes a day. Ingesting and processing that kind of data in a conventional data warehouse or relational database is almost out of the question as it is forbiddingly expensive. It almost came as a blessing that data management and ingestion technologies were invented which allow us to process this kind of data volume and they are tremendously valuable for us.”
Determining how to build the architecture to synchronize data ingestion from different sources into Hadoop can be a steep challenge. It’s also important to think beyond just bringing in the data to developing a sound strategy to create repeatable and consistent processes to manage the entire lifecycle of your data, no matter where it comes from.
Read more on Zaloni's Ingestion Factory solution.
About the AuthorMore Content by Brett Carpenter