Big Data is still an emerging field, with far too many products, promises and pitfalls. Despite all that, this is the season when industry pundits make fearless predictions about how Big Data will evolve in 2016.
Many prognosticators take a high-risk approach, titillating the reader with highly speculative projections. If it comes true, the pundit is hailed as a genius. If not, no one will remember.
I’d like to take a contrarian approach, and offer some low risk Big Data predictions.
Why are they low risk? Because they are already true today, albeit under the radar.
Prediction #1: Spark will be more important than Hadoop
Hadoop has been around since the late ‘90s, and has evolved to the point where it can efficiently and reliably perform Big Data analytics. Spark has the advantage of being a “fast follower”, able to learn from and avoid Hadoop’s mistakes. Spark has a more generic and extensible programming model, which makes it easier to use for analytics. It also can handle Big Data in Motion, via Spark Streaming, and serves as the basis for a powerful graph database (GraphX) and a full-featured data science library (MLib). Spark’s closest relative in the Hadoop world is Tez, which, like Spark, can execute algorithms organised as directed acyclic graphs. The Open Source community, recognising the similarity, has crowned Spark as the converged platform of choice, and it will soon replace Tez in the Hadoop platform. However you slice it, Spark is the future of Big Data computing.
Prediction #2: There will be fewer Big Data start-ups
Venture capital investors view Big Data as last year’s trend. They’ve already doubled down on a variety of start-ups and want to see those investments pan out. So, if you have a great idea for a Big Data start-up – good luck. It will have to be pretty amazing to convince investors to divert money away from their existing Big Data investments.
Prediction #3: Oracle will continue to lose market share to Open Source Big Data technologies
The mainstream software giants have adopted various strategies to cope with the competition from Open Source Big Data platforms. Some have formed alliances (e.g. Microsoft and HortonWorks), some have embraced and extended (e.g. IBM Watson). The company that least seems to “get” this brave new world is Oracle, which continues to sell Exadata (an expensive alternative for Big Data analytics) and has launched their own proprietary NOSQL database that has no advantage over Open Source alternatives. Oracle is having trouble understanding that in 2016, most customers prefer to avoid vendor lock-in.
Prediction #4: Cassandra will become a dominant player in the NOSQL space
Cassandra was always the fastest NOSQL database, especially for write-heavy applications, and it provides an active-active distributed data centre topology out of the box. The knock on Cassandra was that it was hard to deploy, maintain and programme. Datastax, the commercial vendor for Cassandra, seems to have noticed: The CQL language makes Cassandra far easier to programme, and the OpsCenter management tool makes maintenance a lot simpler. Now if they would only implement an aggregation framework like HBase’s co-processors so that application code could run on the server.
There you have it – four low-risk Big Data predictions for 2016. See you at the end of 2016 to grade their accuracy!