Monday, January 07, 2008

Who's the Daddy?

Commoditisation does not necessarily end with centralisation. The current IT trend would appear to be towards large scale utility computing resource providers (such as Amazon EC2). However, I believe there is a future for P2P or F2F (friend to friend) infrastructure. This is where all that spare capacity and idle resources of your personal computing devices are connected together to provide a continuously available online computing gird.

This will probably need some form of reputation system based upon mechanism design theory as per the reputation currency developed in Tribler. It is also (in my view) most likely to start with the distribution of the social graph.

In any case, I made some very rough, primitive and speculative estimates on the size of this available network based upon the number of machines connected, time spent online, actual utilisation rate etc. It would suggest that there is at the very least an 800,000 machine network continuously available all of the time (and that is just the PCs).

This doesn't sound unreasonable given what we know of zombie networks and also the size of some of the BOINC projects such as SETI@home. A 2006 paper on computational and storage potential for volunteer computer networks provides an analysis of 300,000 hosts participating in volunteer computing projects. Of course these machines were not available all of the time, but the spare resources they provided sustained a processing rate of 95.5 TFLOPS (tera floating point operations per second), 7.74 PB of available storage and an access rate of 5.27 TB per second.

The current BOINC stats show 2.5 million hosts with over half a million of these being active and providing 795 TFLOPS. Of course this is just scratching the surface of the 1 billion+ PCs in the wild, let alone the number of games consoles, mobile and other devices.

Another volunteer computing project, Folding@Home, combines PCs with PS3s and has over 250,000 active CPUs providing 1 PFLOPS (peta FLOPS). This huge publicly owned spare computing resource is only going to grow.

So let's compare these massive distributed computers which are using a fraction of a percent of the available spare computing power owned by the public to some of the giants of computing. According to the NYTimes Google is estimated to have 450,000 servers and it has been calculated that this provides between 126 and 316 TFLOPS. I'm not convinced by such figures, but it a least gives you a sense of proportion. A small fraction of a percent of the idle computing resources available to the public is probably larger than Google. How about the fastest operational supercomputer in the world, the IBM Blue Gene/L at Lawrence Livermore National Laboratory. With a sustained processing rate of 478.2 TFLOPS it makes a half decent effort of standing up to a fraction of a percent but no more.

So whilst I agree that commoditisation will led to more centralisation in the first stages, I'm not convinced that this is the end of the story.

Whose got the biggest computing cloud? Amazon? Microsoft? Google?

I reckon in the long run, that'll be us.

Who's the Daddy? We're the Daddy! We just haven't got tooled up yet.