r
rhames
Oct 18, 2011

What is the best kind of storage to use on servers that run Hadoop?

I've read that there are problems getting Hadoop to work with SANs because the data access isn't as fast as DAS (direct-attached storage) . Is there any truth to this rumor? I wanted to start a small Hadoop cluster for testing, but had been hoping to connect it to our current iSCSI solution.

b
bcastle
10/18/2011

It's true: most implementations of Hadoop use the Hadoop Distributed File System, not iSCSI, because they consolidate and share data across multiple nodes.

b
bralphye
10/18/2011

 

You may wish to check into the GlusterFS, an open source file system that was designed for handling large amounts of data. RedHat is in the process of acquiring Gluster, which means that it should integrate well into Linux-based Hadoop implementations. This is the most promising news I've found regarding managing Hadoop data collection in a lower-cost way.

 

http://www.gluster.org/about/

 

Answer this