Tuesday, July 13, 2010

From Slime Mold to Neurons

I wasn't going to write much about clouds, being focused on my new area of research but I could hardly let James' post go unchallenged.

Before I critique the post, I need to go through some basic genetics for those of you who are new to that subject.

DNA is the accepted means of providing genetic instructions used in the development and functioning of all known living organisms. There are exclusions, such as RNA Viruses (which are considered not to be living organisms) and forms of non DNA based inheritance from topology, methylation etc (epigenetics).

DNA doesn't operate in isolation, for example the same DNA sequence in a human produces a multitude of specialised cells. It instead acts in combination with both the environment it exists within and the environments it has existed within. Hence it is more correct to say that DNA contains genetic information that influences the phenotype (characteristics) of an organism.

To keep things simple, I'll ignore the multitude of RNA types (from messenger to transport), the issues of expression, the terminology of genes and 3D geometry and take a few chunky liberties in the description of how DNA works.

In principle, DNA consists of a long double helix sequence of four basic nucleotides (the base pairs) known as C,G,A,T. Different sections of this sequence (referred to as genes) are transcribed and translated into protein structures which affect the operation of the cell. Each three letter word (a codon) of the genetic sequence (i.e. CGT or GAT, giving 64 possible combinations) is translated to an amino acid (of which there are 22 standard).

The entire complexity of life is built upon such simple subsystems which in turn are part of ever more complex systems - cell structures that are part of cells that are part of organs etc. Without this component structure, the level of complexity in living organisms would not have been feasible. It's worth noting that the agility of complex structures to evolve is dependent upon the organisation of their subsystems.

So, what has this to do with cloud?

Well, if you take an example such as Amazon's Web Services, the complexity of the many systems that users have developed with cloud services is based upon the provision of simple, standard subsystems for storage, compute resources and networks.

There is some limited variability in the type of subsystems (for example the size of Amazon instances) and the introduction of a new Cluster Compute Instance but these are the genetic analogy to amino acids which are then used to build more complex protein structures. Your deployment scripts (whether you use a system such as RightScale or another) are your DNA which is then transcribed and translated into the deployment of basic instances to create the complex structures you require.

So, back to James' post. My objection to the post is that whilst you, as a user, can create a slime mould or a neuron or a million other cellular anologies with these basic components, the key is how YOU combine these common and well defined (i.e. commodity-like) components.

James' however infers in his post that we need to see alternative cloud models, not just the "slime mold model cloud" but "more complex topologies" with the "emergence of more topologically calibrated and therefore feature rich clouds". In principle he is arguing for more configuration of these basic subsystems.

Whilst I agree that some additional basic subsystems (e.g. the cluster computer instance) are needed, I'd argue against the principle of wide ranging diversity in the underlying subsystems.  Whilst such a richness of diversity does create benefits for technology vendors - which company hasn't fallen for the "competitive advantage of customizing a CRM system" gag - it will not create the "wealth" and diversity in higher order user created systems but instead lead to a grindingly slow sprawl that will create further lock-in issues, move us away from competitive marketplaces and end up with users spending vastly more time building and configuring stuff which really doesn't matter.

There are always edge cases, however in general the range of subsystems we require are fairly limited and it's from these we can build all the different type of systems (or cells) we want.

If there is anything that should be learned from biological analogies, it is from such "modest entities", such simple subsystems that complexity and diversity is created. We've learned this lesson throughout time, from the electronic to the industrial revolution to the works of Vitruvius.

Unfortunately, as the old adage goes - "the one thing you learn from history, is we never learn from history".

One final note: analogies between cloud computing and biological systems are generally weak at best - my above example is no exception. I use it purely to continue in the same spirit as the original discussion and to try and highlight the core issue of diversity in the subsystems vs diversity in what is built with stable subsystems. I don't recommend comparing cloud computing to biology, unless you want endless arguments.

One very final note: computer models are based on simple arithmetic and hence are bound to Godel's law of incompleteness, neither being being both complete and certain. As activities provided through software tend towards being ubiquitous and well defined, they will tend towards being a good enough component like a defined brick with standardised interfaces. The components themselves will have some inherent non-linear qualities (i.e. the halting problem) which is why the design for failure paradigm is so important.

Biological components are also only linear at a superficial level [e.g. of interfaces such as this codon encoding for this amino acid or a specific cell structure having certain functions or a specific cell type having a defined function] and on an individual level [behind the interface] they are highly non-linear and cannot be described by simple arithmetic means nor modelled with certainty by a computer. For these reasons, biological system have evolved highly complex regulatory systems (such as the human immune system) which even today we barely understand. However, we're acutely aware that a major function of it is to control rogue cells. This level of complexity is far beyond the capabilities of modern computing and also filled with numerous information barriers (the uncertainty principle is only one of many) which prevent us from anything more than approximation. 

However there are many useful concepts in biological systems (Red Queen, Ecosystems, Co-evolution etc) along with certain derived concepts, such as design for failure which have value in the computing world - just be careful on pushing the analogies too far.

--- Update 23 April 2014
Added [emphasis] to clarify certain points