Nov 28, 2011

What are the emerging alternatives to Hadoop?

And what chance do they have?


Zillabyte is an alternative that's getting traction trying to make big data easy for developers and data scientists. (Full disclosure I work there).  I think we have a great chance, but it won't be an overnight shift.  Hadoop is well established in this space.  We are trying to make distributed computing available to devs and data scientists who know how to program, but don't want to deal with the difficulties of setting up a Hadoop cluster.  

Hadoop is great for analyzing petabytes of data, but the tradeoff is that it's expensive to have a team set up and maintain.  There are companies like HortonWorks that specialize in enterprise Hadoop installation to make it easier.  I think it will be interesting over the next several years to see how the industry changes, and to see how Hadoop adapts. 


Wrong link, here is the correct link


Indeed there are. In a recent post at MSys, we had discussed about alternatives for Hadoop. Check it out.



there are several alternative systems that can solve the same and more problems than Hadoop.

https://stratosphere.eu/ is a project that started out as a research project at a university. It has a novel model that allows for more operators than just map and reduce. (It also natively supports match, cross and more). It additionally allows for arbitrary complex job graphs. So you can compine these operators in any way you like. So you could have three inputs, that are joined, reduced, mapped and reduced (by another key). You can even write to as many outputs as you want.
Additionally, Stratosphere also supports iterative algorithms (often needed for Data Mining/Machine Learning). Since this is "natively" implemented into the system, Stratosphere does way better on those jobs than traditional hadoop systems.

There is an actively developed open source version of it on github: https://github.com/dimalabs/ozone 

Another project is Spark: http://spark.incubator.apache.org/ 

It allows applications to be written in Scala, which is an very powerful and expressive functional programming language (Stratosphere also supports Scala). It is really fast on job setup, hence it is very suited for small and medium sized data and ad-hoc evaluations.

Take also a look into the things Cloudera and its competitors are doing (Impala, Hive Stinger Initiative)


Disclaimer: I'm a developer of stratosphere ;)

welcome to our website:

------- http://www.likesurprise.com/ --------

if you like to order anything you like.

More details,

please just browse our website Quality is our Dignity;

Service is our Lift.

enjoy yourself.

thank you!!


Hi Declanchase2,

There are many Hadoop alternatives out there. The HPCC Systems platform is among them for tackling Big Data problems. Unlike Hadoop distributions which have only been available since 2009, HPCC is a mature platform, and provides for a data delivery engine together with a data transformation and linking system equivalent to Hadoop. The main advantages over other alternatives are the real-time delivery of data queries and the extremely powerful ECL language programming model.


More at http://hpccsystems.com

Hi Declanchase2,

Here are a few articles with ideas about alternatives to Hadoop.

Hadoop Fatigue -- Alternatives to Hadoop

What are some promising open-source alternatives to Hadoop MapReduce for map/reduce?

Alternatives for Hadoop/MapReduce data storage and management
Answer this