Sep 22, 2011

Is the technology used in big data environments fundamentally different than traditionally used in enterprise IT environments?

Hi beatrix1,

You might want to have a look at this excellent background article on Big Data. I think it will give you a good idea of why it's different than the usual IT situations.

Big data

"Big data[1] are datasets that grow so large that they become awkward to work with using on-hand database management tools. Difficulties include capture, storage,[2] search, sharing, analytics,[3] and visualizing. This trend continues because of the benefits of working with larger and larger datasets allowing analysts to "spot business trends, prevent diseases, combat crime."[4] Though a moving target, current limits are on the order of terabytes, exabytes and zettabytes of data.[5] Scientists regularly encounter this problem in meteorology, genomics[6], connectomics, complex physics simulations [7], biological and environmental research [8], Internet search, finance and business informatics. Data sets also grow in size because they are increasingly being gathered by ubiquitous information-sensing mobile devices, aerial sensory technologies (remote sensing) "software logs, cameras, microphones, RFID readers, wireless sensor networks and so on."[9][10] Every day, 2.5 quintillion bytes of data are created and 90% of the data in the world today was created within the past two years.[11]

One current feature of big data is the difficulty working with it using relational databases and desktop statistics/visualization packages, requiring instead "massively parallel software running on tens, hundreds, or even thousands of servers."[12] The size of "big data" varies depending on the capabilities of the organization managing the set. "For some organizations, facing hundreds of gigabytes of data for the first time may trigger a need to reconsider data management options. For others, it may take tens or hundreds of terabytes before data size becomes a significant consideration."[13]"

When the data stored enters the petabyte range (1000 terrabytes) and above, the use of scaling and massive parallelism become different than what one might find in a traditional IT environment. This of course means that IT management of data and servers must change because at some point, you can't just scale up the number of staff you have to manage this equipment and the software that runs on it. New tools and techniques are required to handle these changes as they move from the evolutionary to the revolutionary.

Answer this