Lately it appears that data has finally become a first class citizen. The first class citizen had to however had to take on a new image and a new name – “Big Data”. Which is fine – fame always came with a price right? By now it’s clear, it’s all about a new paradigm and a new set of tools and technologies to take advantage of all this data. It’s like the old gold rush – in this case people trying to find those potential gold nuggets of information that could potentially be transformational to businesses.
Intent on getting more informative myself, I attended my first Big Data conference – BigDataTechCon held at San Francisco.
So what did I get out of the conference? Here are my three takeaways:
- There’s evolution and disruption happening at a frantic pace: The open source community is on steroids with all this adoption. In fact, so are vendors who are building on top of or helping to integrate with the Hadoop stack better. As a result, the incubation pipeline is rich and healthy and as well, there’s a steady stream that’s hatching out of the incubator. My personal view is that there’s some overlap and it’s not always easy to pick a tool or technology for a general use case. Many of these codelines are still at version 1 or below.
- Production and widely adopted use cases are still the typical ones around batch: Hadoop excels at batch processing and storage. This theme was pretty solid throughout the conference and there were many talks around the deployment of Hadoop for ETL processing.
- Hadoop is awesome but you will see less of it in the future: According to Doug Cutting, the future of Hadoop is to be sort of the “Big Data OS”. Hadoop and it’s stack has evolved to become a platform for distributed computing. While MapReduce and HDFS are powerful, many consider the interfaces low level and not too developer friendly. The future therefore appears to be hiding the low-level programming interfaces and exposing easier to use interfaces and libraries.
Ok so while the future of Hadoop is less (visibility) of it – now is still the time to get to know Hadoop better. So it’s time for some hands on.. I hope to have a completely self-done cluster of Hadoop by my next post.