Friday, July 30, 2010

Private vs Public clouds

I thought this argument has been settled a long time ago, seems not. So, once more dear friends I will put on my best impression of a stuck record. First what is the difference between a public and a private cloud?
  • A public cloud (the clue is in the name) is open to the public.
  • A private cloud (the clue is in the name) is private to some set of people.
Naturally, there are shades of grey. For example, the set of people for which a private cloud is private might be one person, a department, a company, a community, a nation or some sort of collection of the above. It is common to use a variety of notations (community, government etc) to distinguish these constraints on use i.e. what subset of people are allowed to use it.

There is another side to this which is your relationship to the provider. It is either :-
  • external to you and therefore controlled, operated and run by another party.
  • internal to you which means it is controlled, operated and run by yourself.
Now, once again there are shades of grey because it is perfectly possible for a community of companies to build a shared cloud environment. Examples of the notation include :-
  • AWS offers a public cloud which is external to everyone but Amazon.
  • Eucalyptus offers technology for a company to build a private cloud which is internal to that company.
You could write a list with examples of each but there is little point as no-one uses this notation. Instead in common parlance we tend to use the term public cloud with a single counterpoint of private cloud to mean a cloud where an organisation makes up the subset of private users and the cloud is provided internally to that organisation. Now we have our bearings on the terms, this leaves a question ...

Why use a private cloud?

A private cloud (using the common meaning) is one that you control and operate. It hence overcomes - or at least creates the illusion of overcoming - many transitional risks such as governance, trust & security of supply. However, it does so at the potential loss of the economies of scale found in public clouds combined with additional costs such as planning, administration and management.

The choice over whether to use one type of cloud or another is always one of benefits vs risks (whether disruption, transition or outsourcing risks). A hybrid cloud strategy simply refers to using a combination of both public and private clouds to maximise benefits for a given appetite of risk.

Naturally, the actual risk can change with a variety of events. For example, the formation of competitive cloud marketplaces with easy switching between multiple providers reduces outsourcing risks (e.g. lack of pricing competition, loss of strategic control, lack of second sourcing options).

For a consumer of cloud services, the ideal scenario is multiple providers of the same service, the option to implement your own environment and no loss of strategic control or dependency to a single vendor. For this to scenario happen, the technology must be open sourced and hence the technology owners must first realise that in this cloud world value isn't generated through bits of software but instead through services.

In the same way that it took a book company to disrupt the hosting world by offering commoditised infrastructure services, a hosting company is now trying to do the same to the world of technology vendors through open source. This is just the start and whilst openstack is currently focused on the infrastructure layer, expect it to move up the computing stack in short order.

There are four companies who in my mind exemplify this whole commodity approach - Rackspace, Amazon, Google and Canonical. I expect they will become titans in this space.

-- 5th January 2017

In 2017, there still exists a surprising view that private cloud is more than transitional. Whilst Amazon, Google and Canonical have emerged as major players (in the case of Amazon, a Titan) the story of OpenStack and Rackspace was less rosy. A mix of poor strategic gameplay led to OpenStack becoming firmly entrenched in the private cloud space and though it move up the stack a bit, it never moved into the platform space. That charge has been left to Cloud Foundry which is now facing off against Amazon's effort - Lambda. Rackspace lost the public cloud battle but it has re-invented itself around Amazon. 

Monday, July 26, 2010

OSCON 2010

I thoroughly enjoyed the OSCON cloud summit and the talk that I gave at OSCON - the audiences were fantastic and the organisation was superb (huge thanks to Edd, Allison and the O'Reilly crew for making this happen).

I'm really proud to have played my small part in this event as the MC for day, along with John Willis.

I haven't yet talked a great deal on my research, but the keynote at OSCON gives a taste of it - so I thought I'd link to it here. Those who know me, also know that this had been a hobby horse of mine over the last decade. It's finally good to spend some focused time on it though of course these ideas are far from new.

A couple of final notes :-

  • Utility services are just a domain within the commodity phase of an activity's evolution. There are constraints which will prevent a commodity being provided through services. I sometimes plot on the graph a wider "services" stage, however for the sake of simplicity I've left this out.
  • The stages of lifecycle are approximate only i.e. this is where products appear, this is where utility services generally appear etc.
  • Multiple activities can be bundled into a single product. For example the iPhone is a combination of different activities from personal communication to digital recorder to web surfing to time keeper to ... the list is quite long. These activities are all evolving and being implemented by others, which forces Apple to focus on two areas :- the bundling of new innovative activities into the iPhone and application innovation through the App Store. The former is expensive and risky. The later requires development of a strong ecosystem, ideally with users being allowed to create and distribute their own applications. The manner in which Apple manages this is less than ideal and they now face severe disruption from Android. As there is also little exploitation of the wider manufacturers' ecosystem, Apple has cornered itself into creating highly costly & risky innovations with weak leveraging. IMHO, they are in trouble and this should become painfully clear in the next five years unless they change.
  • The ILC model is generally applicable. I picked examples from cloud providers but equally I could have discussed Canonical with Ubuntu. Canonical ruthlessly commoditises activities to provide a stable core and I'd strongly argue that Rackspace & Canonical point to the future direction of IT.
  • Open source is the natural end state for any activity described by software which is ubiquitous and well defined. This doesn't mean that open source can't be used earlier, of course it can and there are numerous tactical advantages of doing so, along with benefits such as increased collaboration. However, what I am saying is that by the time an activity has reached the commodity phase then only open source makes sense. Those who have been questioning whether "cloud is the death of open source" have a poor understanding as to what is actually happening.
  • Open core is in general a tactical anomaly. On the one hand, if successful, it will cause widespread distribution (driving an activity towards more of a commodity) and yet it attempts to generate revenue through proprietary elements which is against the natural state that open core is forcing activities towards. A number of companies have used this approach successfully and have even been bought for huge sums by large companies. However, it still remains a tactical anomaly which attempts to achieve both the benefits of open and closed by being both.
  • The S-Curves I use are not time based. If you follow the evolution of an activity through specific phases of its lifecycle and plot adoption against time, you will derive a set of non-uniform S-Curves for Roger's diffusion of innovation. It's important to realise that the accelerators I mentioned (open source, participation, network effects) along with others I didn't mention (communication mechanisms, co-evolution etc) alter the speed at which an activity evolves. Whilst, this doesn't impact the S-Curves I use, it does compact Roger's curves of more recent innovations when compared to earlier diffusions.
  • The speed at which an activity moves across the profile graph (i.e. through its lifecycle) depends upon the activity.
  • None of these ideas are new. The nearest to new is company profile which I've been refining in the last year from earlier work (between '04-'07) and this refinement is simply a formalisation of already existing concepts. If you watched the video and thought, "that's new", then my only advice is be concerned.
  • On the question of science, the models presented (S-Curve, Profile) are part of a general hypothesis on the evolution of business activities. Whilst data exists, there is neither the volume of evidence nor independent observation to validate beyond this. Furthermore, whilst the models show some usefulness and can be falsified, they are not predictive (and hence this cannot be considered scientific but remains firmly within the field of philosophy). The reason for this is that in order to generate the graphs and avoid asymptotic behaviour, a definition of commodity is required. The consequence of such is that an activity can only be plotted in terms of relative historical position i.e. after it has become a commodity. This means, all positions of activities which have not become a commodity are uncertain (as per one of the axis of the graph) and therefore approximations. The models do not create a crystal ball and the future is one information barrier we can't get past. Even though the new pattens of organisation are testable it should always be remembered that fitness does not guarantee survival.

That's enough for now, I'll expand the topic sometime later.

Monday, July 19, 2010

OpenStack

There have been many attempts to create open source ecosystems around cloud computing over the last couple of years. Most of them have not fully adopted the largest public ecosystem (being EC2) or been truly open source (instead using an open core model) or they have lacked the experience of large scale cloud operations.

The recent announcement of Open Stack changes this. Entirely open sourced technology for building and running a cloud, supported by an ecosystem of large companies and agencies (including NASA and Rackspace), provision of the EC2 & S3 APIs and the experience of running a large cloud installation. 

This is fantastic news. If you want my view on how this will turn out, well it's rather simple.

OpenStack's move further consolidates the ecosystem around EC2 / S3 which is not only good news for Amazon but also helps propel Rackspace's position as a real thought leader in this space. It's worth noting that the EC2 / S3 API might be supplanted over time, especially as OpenStack builds a marketplace of providers, unless Amazon becomes more open with it. The icing on the cake will be if Rackspace itself (which will use the OpenStack technology) provides the EC2 / S3 APIs, in which case the growth and consolidation around Rackspace's efforts and any providers of OpenStack will become immense.

This is also surprisingly good news for Eucalyptus if they move towards an entirely (or at least more) open approach. In such circumstance, the probability is we're going to end up with a straight forward "clash of the titans" between Eucalyptus and OpenStack to become the Apache of Cloud Computing.

Don't be surprised if Eucalyptus even go so far as to adopt some of OpenStack's work. Marten Mickos is an astute businessman and there are many ways they can turn this to their advantage. However, in general it's not a good time to be any other infrastructure cloud technology vendor, as Simon Crosby makes clear with his "VMWare Redwood = DeadWood" post.

VMWare's position in the infrastructure space is looking unsurprisingly shaky for the future but then they already know of the oncoming disruption, as they made clear with this interview. Why else do you think that VMWare has been busily acquiring into the platform space? RabbitMQ is also increasingly looking like a great purchase for them.

As for RedHat's cloud strategy - they must be feeling increasingly lonely as if no-one wants to invite them to the party. On the other hand, this is good news for Ubuntu, because of both UEC (powered by Eucalyptus) and OpenStacks involvement with the Ubuntu community. Don't be surprised if Ubuntu launches a "powered by openstack" version.

Best of all, it's great for the end users as they will see real choice and further standardisation of a messy industry in the infrastructure space. Of course, the real beauty is that once this happens we can finally start consolidating and standardising the platform space.

Overall, I'm very bullish about OpenStack and its focus on the Amazon APIs. There is a long road ahead but this has potential.

Tuesday, July 13, 2010

From Slime Mold to Neurons

I wasn't going to write much about clouds, being focused on my new area of research but I could hardly let James' post go unchallenged.

Before I critique the post, I need to go through some basic genetics for those of you who are new to that subject.

DNA is the accepted means of providing genetic instructions used in the development and functioning of all known living organisms. There are exclusions, such as RNA Viruses (which are considered not to be living organisms) and forms of non DNA based inheritance from topology, methylation etc (epigenetics).

DNA doesn't operate in isolation, for example the same DNA sequence in a human produces a multitude of specialised cells. It instead acts in combination with both the environment it exists within and the environments it has existed within. Hence it is more correct to say that DNA contains genetic information that influences the phenotype (characteristics) of an organism.

To keep things simple, I'll ignore the multitude of RNA types (from messenger to transport), the issues of expression, the terminology of genes and 3D geometry and take a few chunky liberties in the description of how DNA works.

In principle, DNA consists of a long double helix sequence of four basic nucleotides (the base pairs) known as C,G,A,T. Different sections of this sequence (referred to as genes) are transcribed and translated into protein structures which affect the operation of the cell. Each three letter word (a codon) of the genetic sequence (i.e. CGT or GAT, giving 64 possible combinations) is translated to an amino acid (of which there are 22 standard).

The entire complexity of life is built upon such simple subsystems which in turn are part of ever more complex systems - cell structures that are part of cells that are part of organs etc. Without this component structure, the level of complexity in living organisms would not have been feasible. It's worth noting that the agility of complex structures to evolve is dependent upon the organisation of their subsystems.

So, what has this to do with cloud?

Well, if you take an example such as Amazon's Web Services, the complexity of the many systems that users have developed with cloud services is based upon the provision of simple, standard subsystems for storage, compute resources and networks.

There is some limited variability in the type of subsystems (for example the size of Amazon instances) and the introduction of a new Cluster Compute Instance but these are the genetic analogy to amino acids which are then used to build more complex protein structures. Your deployment scripts (whether you use a system such as RightScale or another) are your DNA which is then transcribed and translated into the deployment of basic instances to create the complex structures you require.

So, back to James' post. My objection to the post is that whilst you, as a user, can create a slime mould or a neuron or a million other cellular anologies with these basic components, the key is how YOU combine these common and well defined (i.e. commodity-like) components.

James' however infers in his post that we need to see alternative cloud models, not just the "slime mold model cloud" but "more complex topologies" with the "emergence of more topologically calibrated and therefore feature rich clouds". In principle he is arguing for more configuration of these basic subsystems.

Whilst I agree that some additional basic subsystems (e.g. the cluster computer instance) are needed, I'd argue against the principle of wide ranging diversity in the underlying subsystems.  Whilst such a richness of diversity does create benefits for technology vendors - which company hasn't fallen for the "competitive advantage of customizing a CRM system" gag - it will not create the "wealth" and diversity in higher order user created systems but instead lead to a grindingly slow sprawl that will create further lock-in issues, move us away from competitive marketplaces and end up with users spending vastly more time building and configuring stuff which really doesn't matter.

There are always edge cases, however in general the range of subsystems we require are fairly limited and it's from these we can build all the different type of systems (or cells) we want.

If there is anything that should be learned from biological analogies, it is from such "modest entities", such simple subsystems that complexity and diversity is created. We've learned this lesson throughout time, from the electronic to the industrial revolution to the works of Vitruvius.

Unfortunately, as the old adage goes - "the one thing you learn from history, is we never learn from history".

One final note: analogies between cloud computing and biological systems are generally weak at best - my above example is no exception. I use it purely to continue in the same spirit as the original discussion and to try and highlight the core issue of diversity in the subsystems vs diversity in what is built with stable subsystems. I don't recommend comparing cloud computing to biology, unless you want endless arguments.

One very final note: computer models are based on simple arithmetic and hence are bound to Godel's law of incompleteness, neither being being both complete and certain. As activities provided through software tend towards being ubiquitous and well defined, they will tend towards being a good enough component like a defined brick with standardised interfaces. The components themselves will have some inherent non-linear qualities (i.e. the halting problem) which is why the design for failure paradigm is so important.

Biological components are also only linear at a superficial level [e.g. of interfaces such as this codon encoding for this amino acid or a specific cell structure having certain functions or a specific cell type having a defined function] and on an individual level [behind the interface] they are highly non-linear and cannot be described by simple arithmetic means nor modelled with certainty by a computer. For these reasons, biological system have evolved highly complex regulatory systems (such as the human immune system) which even today we barely understand. However, we're acutely aware that a major function of it is to control rogue cells. This level of complexity is far beyond the capabilities of modern computing and also filled with numerous information barriers (the uncertainty principle is only one of many) which prevent us from anything more than approximation. 

However there are many useful concepts in biological systems (Red Queen, Ecosystems, Co-evolution etc) along with certain derived concepts, such as design for failure which have value in the computing world - just be careful on pushing the analogies too far.

--- Update 23 April 2014
Added [emphasis] to clarify certain points