Monday, November 19, 2012

Big Data

"Big Data" is occurring due to the increase in new activities producing un-modelled data combined with the cost of thinking about whether to store data exceeding the cost of storing. To be put it simply, it's cheap (or should be) to store everything.

It's like my "stuff" box at home but on steroids.  I throw all sorts of bits and pieces into my "stuff" box because it's cheaper than taking the time to sort out what I should keep. I also do so on the assumption that "there's value or maybe there will be value in that stuff".  In general this turns out to be a delusion, it's mainly junk but I can't help thinking that "there's a pony in that field".

Eventually the box becomes full and I'm faced with a choice. Buy a bigger box (scale-up), buy another box (scale-out) or just bin it anyway. I tend to do the latter. What I don't need is a bigger or better or more distributed "box" but instead a better "algorithm" for sorting out what has value and what doesn't.

A lot of "Big Data" seems to be about better "boxes" where in my honest opinion it should be focused on better "algorithms / models". I'm not against storing everything, especially when it's cheap to store data (i.e. distributed system built with commodity components etc) as you never know what you might find. However, that shouldn't be the emphasis.

Oh, as for my "stuff" box, StorageBod humorously raised the idea of using the attic. Look's like I've got an even bigger "stuff" box now, though I'm not sure that helps me? I'll have to decide whether I fill the attic with lots of "stuff" boxes or use it as a free for all? Maybe I'll need a cataloguing system?

Of course if I fill up my attic with stuff then I'll probably end up with some salesman telling me stories about how "Mr Jones found a lost lottery ticket" or "Ms Jones found an old master's" in their attics. I'll probably end up spending a shed load of cash on the "Attic Drone Detection, Investigation, Cataloguing and Treasure Seeking (ADDICTS)" system.

I know there's a pony in that field somewhere, I'm sure of it. Otherwise I wouldn't just put this stuff in a box marked "stuff" - would I?