Spark lets you do complex things with a lot less code. This lower barrier to entry, plus it’s faster processing speed, is “sparking” - pun intended - increased interest from more and more enterprises on how they can take advantage of Spark. Although one survey shows that only about 13% of developers and data scientists currently use Apache Spark and 51% say they are evaluating Spark or plan to use it for big data processing.
Zaloni is “Spark-ified”
Zaloni is committed to ensuring its solutions integrate seamlessly with new technologies as they emerge. As interest in Spark has grown, Zaloni has made it possible for enterprises to leverage Spark (in addition to MapReduce) for data management and data preparation. And, Zaloni takes Spark to the next level by automatically generating Spark code behind the scenes – significantly saving developers’ time and making Spark even easier for end users with different levels of technical ability.
Spark Within the Zaloni Data Lake Management Platform
The Zaloni Data Lake Management Platform leverages both MapReduce and Spark. If Spark is enabled on a cluster, the platform determines the best execution framework for a given workflow and can utilize spark to:
- Speed up workflow actions such as watermarking, masking and tokenization
- Take custom Spark or Spark SQL code from developers or data scientists and submit the code to the cluster
- Allow users to configure how Spark jobs are deployed
- Simplify provisioning of data from the data lake with a Query Builder interface that can run Hive, Impala or SparkSQL engine – providing flexibility for users to choose the SQL on the Hadoop engine they prefer
Spark for Self-Service Data Preparation Platform
Zaloni's Platform also gives business users the ability to drag-and-drop entities and combine and filter results through its UI – while Spark Scala code is automatically generated on the backend. With the platform, business end users can perform self-service data preparation without needing to go through IT.
Zaloni has been working with Spark since it began gaining popularity with developers in 2014. As such, we have extensive experience leveraging Spark for clients using our platform. Spark isn’t the right processing engine for every project and we regularly advise clients when and when not to use it. If you’re interested in finding out if and how Spark could fit within your big data architecture, please give us a call.
About the Author
Scott Gidley is Vice President of Product Management for Zaloni, where he is responsible for the strategy and roadmap of existing and future products within the Zaloni portfolio. Scott is a nearly 20 year veteran of the data management software and services market. Prior to joining Zaloni, Scott served as senior director of product management at SAS and was previously CTO and cofounder of DataFlux Corporation.More Content by Scott Gidley