Microservices are a key enabler of large, scalable implementations making optimum use of hardware. Companies such as Netflix have pioneered their use and have shown the promise of what they can achieve. Netflix has used Microservices to support over 35% of all internet download traffic in North America.
Leveraging microservices requires developing software in an architecture that allows deployment of small, modular programs or services that communicate in a well-defined way. By deploying the services on separate hardware, the software can quickly scale to handle higher loads. Additionally, by developing well defined communication channels, high availability can be achieved. Basically, in case a process dies, there is fault tolerance to assure that the process starts again.
Implementing Hadoop holds the promise of working with ingesting, analyzing, and managing large datasets. Data can be pulled from existing infrastructure (Mainframes, Data Warehouses (DWs), flat files, etc.) or from new sources (SaaS products like Salesforce, Feeds from services like Twitter, etc.) As data needs increase, this requires a software infrastructure that provides scalability across the entire data management process and Microservices are an ideal way of achieving that goal. Although, legacy programs can implement some modularity, it is a major architectural change that can require prohibitively significant investment to achieve much benefit. The ideal approach is to build a product from the ground up with a modular architecture.
Zaloni’s data lake management platform is built modularly, using microservices, to enable, orchestrate, and govern a Hadoop infrastructure. It is built to manage any velocity, variety, and volume of data as long as there is hardware to support it. The platform leverages two key Microservices – the Data Collection Agent (DCA) and the Workflow Executor (WFE), among others. The DCA helps to manage the movement in and out of the Data Lake as well as within the Lake itself. The WFE manages jobs that are run in Hadoop and communicates with the platform on execution status. These two microservices can be deployed on the same Edge node that the data lake management platform resides. As a business grows and needs change, the DCA and WFE can be run on as many edge nodes as necessary. This allows more ‘pipes’ into the heart of the data processing engine and ultimately increased performance.
About the AuthorMore Content by Aashish Majethia