Sep 19, 2011

What is Big Data?

I've been hearing a lot about Big Data. What makes this any different than the large amounts of data we've been generating up to now?

Here's a good overview of big data for you, OldHippie.

Big data

"Big data[1] are datasets that grow so large that they become awkward to work with using on-hand database management tools. Difficulties include capture, storage,[2] search, sharing, analytics,[3] and visualizing. This trend continues because of the benefits of working with larger and larger datasets allowing analysts to "spot business trends, prevent diseases, combat crime."[4] Though a moving target, current limits are on the order of terabytes, exabytes and zettabytes of data.[5] Scientists regularly encounter this problem in meteorology, genomics[6], connectomics, complex physics simulations [7], biological and environmental research [8], Internet search, finance and business informatics. Data sets also grow in size because they are increasingly being gathered by ubiquitous information-sensing mobile devices, aerial sensory technologies (remote sensing) "software logs, cameras, microphones, Radio-frequency identification readers, wireless sensor networks and so on."[9][10] Every day, 2.5 quintillion bytes of data are created and 90% of the data in the world today was created within the past two years.[11]

One current feature of big data is the difficulty working with it using relational databases and desktop statistics/visualization packages, requiring instead "massively parallel software running on tens, hundreds, or even thousands of servers."[12] The size of "big data" varies depending on the capabilities of the organization managing the set. "For some organizations, facing hundreds of gigabytes of data for the first time may trigger a need to reconsider data management options. For others, it may take tens or hundreds of terabytes before data size becomes a significant consideration."[13]"

What has happened is that fast multiprocessor servers and the explosion in the growth of servers has allowed us to build larger data sets in our data warehouses than our old tools and methodologies could handle, and getting results from this mountain of data has been like finding a needle in a haystack. What makes Big Data different is that if your business is using a modern database that's optimized for it, like Sybase, there are tools and techniques that make it much easier to drill down and find the specific information you need. For example, Sybase IQ 15.3's MPP (massively parallel processing) grid architecture basically allows a sharing of processing power, memory and access to data, so that a huge number of users can all perform some of their own analytics. This is a boon to productivity as we can now consider larger data sets to recognize long-term trends.

Answer this