Are You Ready for the Future of Big Data?

November 10, 2016 Jean Georges Perrin

A few weeks ago, I went to the IBM World of Watson conference in Las Vegas, NV. Being one of the (roughly) 500 IBM Champions worldwide, this seems to be my yearly migration, my regular “blue shot”, my commitment to the program. You get the gist. But who knows… What happens in Vegas stays in Vegas.

This article is one of a 3-part summary about what I learned from World of Watson (WoW). All articles are independent and you do not have to read them in a specific order. In this part, I am covering concepts that explore the future of big data, data lakes, governance, and data wrangling. A more personal summary can be read on my blog (JGP.net), with some restaurant suggestions (just a few). The second part is focusing on Informix, its usage and how active it is in the IoT world.

Governance is Everywhere

You could argue that this isn’t breaking news, but before this conference, I had the feeling that it was a vain wish: it was a separate project that was not integrated in the business processes. As an example, this is exactly what Vanguard did when they started, more than 10 years ago, with their own DG (data governance) 1.0 tool.

IGC at Vanguard photo

Vanguard is a pioneer in the whole governance process. They started, like everyone, with Excel and other such office tools, switching to a full integrated process using IBM InfoSphere Governance Catalog (IGC).

They now use Open IGC (InfoSphere Governance Catalog) for its extensibility and thanks to this journey they pioneered, they have a serious experience in metadata management and governance. It reassures me as one of their 401k customers.

Thinking of metadata management, the "M" word is no longer a mystery. Proper governance requires cleansed and reliable metadata. For this reason, a lot of the tools integrate more and more of those features as part of their standard operating procedures.

And it’s a good feeling to see that I am not the only person working on the subject.

 

Integrated Governance.jpg

More than a long term vision, IBM demonstrated some of those concepts in their new Data Connect product.

 

Open IGC Architecture.jpg

Open IGC supports extensibility (hence its “Open” prefix). If you look closely you won’t see Bedrock (more on that in a future post).

The industry is definitely more and more in need for such integration, automatic metadata management is becoming key to governance.

Data Lenses and the Future of Machine Learning

Last week, Jane came to Carrboro High School wearing a green top with a pink skirt. Paul, who likes Jane, complimented her on the choice of cloth, but Julie who witnessed the scene thought he was making fun of her bestie, who’s going out with Philipp. But as Julie, who has a little crush on Paul who only has eyes for Jane, reported the incident to Philipp. Of course, Philipp did not like that, and they went behind the gym to solve the issue.

So, from a student point of view, this is a normal high school drama, while for the admin staff it was bullying, even if Philipp and Paul solved their issues with a battle of “Magic the Gathering”.

This was the theme of the example that MIT Media Lab director, Joichi Ito, used to explain the concept of data lenses. Of course, this noble institution is not spending all this energy to solve high school dramas, but rather to enhance the output of analytics and, very precisely, the idea of giving a “job experience” feeling in traditional Machine Learning techniques.

So what’s a typical use-case? Imagine an experienced cop (not starting any debate here). He has this knowledge based on his experience, he sees clues, where we would not see anything, he knows where to look for indices. The idea of the data lenses is to build this prism through which the machine will see data differently.

We are not all PhD.jpg

Joi Ito is reminding us that we do not all have a PhD. 

What does it mean concretely? You tint your Machine Learning model with the experience of the professional. 

Cloud, Cognitive, and Analytics

These are the 3 keywords you should remember from World of Watson 2016.

IBM believes in Cloud, which some might say is not really surprising. I strongly believed in Cloud even before it was called Cloud. And really, as keynote speaker Tom Friedman, and three time Pulitzer Prize winner: “This ain’t no cloud, folks. This is a technological supernova, the explosion of a star. And we know what happens with the explosion of a star — it’s the center of everything”.

Tom Friedman cloud supernova quote.jpg

It sure is a high-level view and it needs to be drilled down into concrete implementations, but everybody is working in some kind of cloud. My biggest belief is that hybrid clouds will be the predominant architecture for the next 5 years. This means that your software needs to be aware of this and benefits within. Not going for a shameless Zaloni-promotion here, but this is exactly the idea behind DLM (Data Lifecycle Management) where Bedrock can archive data in the cloud when it’s cold.

Cognitive is just about AI. But, AI stands less and less for artificial intelligence, rather it stands for augmented intelligence. As augmented reality displays additional information on your screen, augmented (aka extended) intelligence will help you make better decisions.

Augmented Reality Nidoran on desk as I work on Metadata Architecture.jpgAn example of Augmented Reality in Pokémon Go, a female Nidoran is walking on my desk as I work on metadata architecture. 

Thanks to smarter applications that can pre-analyze your data, your analytics will get smarter, more impactful. This brings me to what was the biggest insight of the conference: Just as IBM did with Linux a few years ago - phasing out all their operating systems in favor of Linux - IBM now defines Spark as an Analytics Operating System.

Rob Thomas and Adam Kocoloski Spark Analytics Operating System.jpg

Rob D. Thomas, VP Product Development IBM Analytics, and Adam Kocoloski, CTO for Data Services, co-founder of Cloudant, on Spark as an Analytics Operating System.

For me, this is a huge step forward and confirm that our choice of using Apache Spark as our underlying transformation engine for Bedrock is the way to go. I look forward to embracing even more Spark features in our products (but I can’t share more for now).

Stay tuned. Not exactly everything that happens in Vegas stays in Vegas.

About the Author

Jean Georges Perrin

Jean Georges Perrin is a software architect for Zaloni. He is passionate about software engineering and all things data, small and big data. His latest endeavors bring him in the Apache ecosystem, with a definite penchant for Spark and Zeppelin. He is proud to have been the first in France to be recognized as an IBM Champion, and to have been awarded the honor for his ninth consecutive year. Jean Georges shares his more than 20 years of experience in the IT industry as a presenter and participant at conferences and through publishing articles in print and online media. His blog is visible at http://jgp.net. When he is not immersed in IT, which he loves, he enjoys exploring his adopted region of North Carolina with his kids.

Follow on Google Plus Follow on Twitter More Content by Jean Georges Perrin
Previous Article
The Best Ways to Get Started with HCatalog
The Best Ways to Get Started with HCatalog

HCatalog, also called HCat, is an interesting Apache project. It has the unique distinction of being one of...

Next Article
Zaloni Zip: A Breakdown of Data Lifecycle Management
Zaloni Zip: A Breakdown of Data Lifecycle Management

Data Lifecycle Management optimizes utilization of HDFS by leveraging the tiered storage solution provided ...

×

Get the latest tips and how-to's delivered straight to your inbox!

First Name
Last Name
Zaloni Blog Email Subscription
Thank you!
Error - something went wrong!