The Hadoop 2.x update has made it possible for Hadoop-based platforms to expand the scope of this framework like never before. With additions such as YARN, Hadoop-based platforms can now be used to efficiently create Data Hubs. Zaloni’s Bedrock Management Platform™ is a logical and superior choice for creating these data hubs for several reasons.
Zaloni has worked closely with customers in different verticals to create Bedrock. Our solution is built from the ground up with Hadoop as the target data platform. Bedrock bridges the gap in data management features that are common in traditional enterprise data warehouses but are missing from Hadoop-based data hubs.
In traditional systems there is always a complex processing of database design and modeling involved before you can onboard the data into a data warehouse. This requires long lead times for onboarding new datasets, upfront definition of schema, processing and reports, and other time-consuming steps. All this leads to a lengthy duration before insights can be derived out of the original data. In a traditional system data is also stored in silos so one business unit may not be able to easily access data generated by another business unit. All of these issues, along with the proprietary hardware and software that is used, create risk and require a huge investment.
In a world where innovation moves at an unprecedented speed, companies want to avoid this lengthy, costly process and gain insights from their data as soon as possible.
Companies also want data hubs to include both structured and unstructured data, and data from a variety of sources, including operational data stores, OLTP systems and data warehouses. By breaking down these silos and uniting the data in the data hub, data consumers can access a wider variety of data to support multiple use cases.
When creating a data hub based on Hadoop, there are common management functions that need to be implemented. These include managing:
- The data ingestion process from various sources.
- The workflow and processing of the data as its ingested, including cleansing, watermarking, masking/tokenization and any required transformation of the data. These features make the data ready once it’s ingested, so that various data consumers can use it.
- Metadata that captures information about the ingested data, as well as any new data that is created in the data hub. This includes business, technical and operational metadata that can support various uses, such as ingestion history, lineage, security, governance and ETL workflows.
- Policy-based retention and lifecycle for the data that is ingested into the data hub, and new data that has been generated in the hub.
Bedrock provides these core management features out of the box. It integrates data ingestion, cleansing and processing to create end-to-end managed data pipelines for better manageability, traceability and operations.
Customers can use Bedrock to connect easily to disparate data sources, including EDW’s, and quickly build and deploy data hubs. Zaloni also provides end-to-end implementation services that support all commercial Hadoop distributions. Examples of these use case implementation services include pre-processing hubs for ETL offload, mainframe offloads and data warehouse archives.
Additionally, Zaloni has recently partnered with IBM to create a best-of-the-breed data hub platform. By deploying Bedrock with IBM BigInsights, customers can now leverage enterprise-grade features of the Hadoop distribution, such as redundancy across data centers for disaster recovery, improved scheduling of MapReduce workflows and improved security, along with the management features provided by Bedrock. In my next blog post I’ll dive deeper into our relationship with IBM and what it means to enterprises.
Meanwhile, I invite you to ask questions in the comments below. I will be reading your questions and will compile them and respond in an upcoming blog post.
About the Author
Ben Sharma, is CEO and co-founder of Zaloni. He is a passionate technologist with experience in business development, solutions architecture, and service delivery of big data, analytics and enterprise infrastructure solutions. Having previously worked in management positions for NetApp, Fujitsu and others, Ben’s expertise ranges from business development to production deployment in a wide array of technologies including Hadoop, HBase, databases, virtualization and storage. Ben is the co-author of Java in Telecommunications and holds two patents. He received his MS in Computer Science from the University of Texas at Dallas.Follow on Twitter More Content by Ben Sharma