Sep 22, 2011

Isn't big data all about analytics at scale? Shouldn't we be calling this Big Analytics?

Hi ktrope,

I posted this link in another question, but I think it's relevant here too. This article will give you a good background on Big Data that will help explain why it's not called Big Analytics.

Big data

"Big data[1] are datasets that grow so large that they become awkward to work with using on-hand database management tools. Difficulties include capture, storage,[2] search, sharing, analytics,[3] and visualizing. This trend continues because of the benefits of working with larger and larger datasets allowing analysts to "spot business trends, prevent diseases, combat crime."[4] Though a moving target, current limits are on the order of terabytes, exabytes and zettabytes of data.[5] Scientists regularly encounter this problem in meteorology, genomics[6], connectomics, complex physics simulations [7], biological and environmental research [8], Internet search, finance and business informatics. Data sets also grow in size because they are increasingly being gathered by ubiquitous information-sensing mobile devices, aerial sensory technologies (remote sensing) "software logs, cameras, microphones, RFID readers, wireless sensor networks and so on."[9][10] Every day, 2.5 quintillion bytes of data are created and 90% of the data in the world today was created within the past two years.[11]
One current feature of big data is the difficulty working with it using relational databases and desktop statistics/visualization packages, requiring instead "massively parallel software running on tens, hundreds, or even thousands of servers."[12] The size of "big data" varies depending on the capabilities of the organization managing the set. "For some organizations, facing hundreds of gigabytes of data for the first time may trigger a need to reconsider data management options. For others, it may take tens or hundreds of terabytes before data size becomes a significant consideration."[13]"

No, because sometimes the scaling is just taking place on the amount of data and cpu/storage/RAM side, but the analysis isn't changing so much as it's required to work in real-time with massive throughput. And other times, metadata must be used to structure the data into a more useful form. And on the analytical end, with all this tremendous data captured, we may need to device new algorithms to come to more accurate conclusions. So analytics may be different, but it's not scaling big, like the data is.

Answer this