Wednesday, August 04, 2010

Islands in the sky

I'm often asked how will the cloud develop to which I'll answer -"imperfectly, very imperfectly".

I was reminded of this through a long discussion with Benjamin Black, hence I thought I'd write something to explain my general thoughts on the problem. First, let me apologise as this will be a long post. Second, we need to start by recaping some basic concepts about risks. The barriers to adoption in cloud cover three basic forms of risk :-

Disruption Risks : Change to existing business relationships combined with issues around political capital and previous enterprise investment. It's often difficult to let go of that which we have previously invested in.

Transitional Risks: These risks are related to the shift from a world of products to a world of services and they include confusion over the models, trust in the service providers, governance of this service world, transparency from the providers and security of supply. Many of the transitional risks can be mitigated with a hybrid (private + public) cloud approach, a standard supply chain management technique. This approach has been used in many industries which have undergone a similar change, for example in the early decades of power generation it was common to combine public generation with private generators. Even today most data centres mix a variety of public suppliers with backup generators and UPS systems. Fortunately, these transitional risks are relatively short lived.

Outsourcing Risks: These cover lack of pricing competition between the new providers , lack of second sourcing options between providers, loss of strategic control to a specific technology vendor, lock-in and unsuitability of the activity for such service provision (i.e. it's not ubiquitous or well defined enough for such volume operations based service provision). The outsourcing risks can be reduced through the formation of a competitive marketplace of providers with easy switching between them and ideally the option to in-house service provision. The outsourcing risks are long term.

For a competitive market to form, you need easy switching which means portability. The basic ingredients of portability include a choice of providers, access to your code and data from any provider and semantic interoperability between providers i.e. both the origin and destination providers need to understand your code and data in the same way. There is limited value in having access to your code and data if no other provider understands it and operates to provide the same functionality e.g. getting access to your data in salesforce is great but what do you do with it?

In such circumstances, there does exist a weaker form of syntactic interoperability, which means both providers can exchange data but the end result may not function in the same way and your data may not retain its original meaning. Often, this is where we see translation systems to convert from one system to another with the usual abundance of translation and semantic errors.

The ideal situation is therefore semantic interoperability, which generally means a common reference model (i.e. running code) which providers either operate or conform to. Unfortunately, common reference models come with their own risks.

Let us suppose you have a marketplace of providers offering some level of service at a specific level of the computing stack (SPI Model) and these providers operate to a common reference model. The model provides APIs and open data formats, giving you access to your code and data. You therefore have a choice in providers, access to your data and semantic interoperability between them. You have portability. BUT, if that common reference model is owned by a vendor (i.e. it's proprietary code) then that market is not free of constrant but instead controlled by the vendor. All the providers & consumers in that marketplace hand over a significant chunk of strategic control and technology direction to the vendor, who is also able to exert a tax on the market through license fees.

To reduce this loss of strategic control and provide a free market (as in free of constraints), then that common reference model must not be controlled by one party. It has to be open sourced. In such an environment, competition is all about operational efficiency and price vs QoS rather than bits. This makes intuitive sense for a service world, which is why I'm pleased openstack is following that route and I hope it will become the heart of a market of AWS clones. Obviously, you'll need different common reference models at different layers of the computing stack. Whilst only one is probably needed for infrastructure, you will need as many as there are competitive application marketplaces (CRM, ERP etc) in the software later of the SPI model.

Before anyone cries the old lie of standardisation hampers innovation, it's worth remembering that utility service provision (which is what cloud is really about) requires volume operations which in turn requires a ubiquitous and well defined activity. Whilst the common reference models certainly won't be perfect in the beginning, they don't need to be, they only have to create "good enough" components (such as a defined virtual machine). They will improve and evolve over time but the real focus of innovation won't be on how good these "good enough" components are but instead what is built with them. This concept, known as componentisation, is prevalent throughout our industrial history and shows one consistent theme - standardisation accelerates innovation.

So everything looks rosy … we'll have the economics benefits of cloud (economies of scale, increased agility, ability to focus on what matters), competitive marketplace based around multiple providers competing on price vs QoS, the options to use providers or install ourselves or to mitigate risks with a hybrid option, "open" API & data formats giving us access to our code and data, open sourced common reference models providing semantic interoperability, "good enough" components for ubiquitous and well defined activities which will cause an acceleration of innovation of new activities based upon these components … and so on.

Think again.

In all likelihood, we're going to end up with islands in the cloud, marketplaces built around specific ways of implementing a ubiquitous and well defined activity. Don't think of "good enough" components but instead a range of different "good enough" components all doing roughly the same thing. Nuts? It is.

Hence, in the infrastructure layer you're likely to see islands develop around :-
  • EC2/S3 (e.g. core of AWS) including the open source implementations such as Open Stack, Eucalyptus and Open Nebula.
  • vCloud principally provided through VMWare technology.
  • a Microsoft infrastructure based environment.
  • any Openstack APIs, particularly if Rackspace implements this.
All of these will be providing their own versions of "good enough" units of virtual infrastructure. Within those islands you'll head towards multiple service providers or installations, a competitive marketplace with switching between installation and semantic interoperability based upon a common reference model. The open source projects such as OpenStack are likely to form assurance industries (think moody's rating agencies, compliance bodies) to ensure portability between providers by comparison to the common reference model whereas the proprietary technologies are likely to develop certification bodies (e.g. VMWare Express).

Between islands there will be only syntactic interoperability (with exceptions such as OpenStack which will try to span multiple Islands), which will mean that you'll require translation of systems from one island to another. Whilst management tools will develop (and already have started) to cover multiple islands and translation between them, this process is imperfect and a constant exercise in chasing different APIs and creating a lowest common denominator (as per libcloud). Of course, I wouldn't be surprised if the libcloud folk were hoping that as a community develops around them, then the providers will offer libcloud as a native API. Such command & conquer strategies rarely succeed.

Given this complexity and since there will be multiple service providers within an island, it's likely that consumers will tend to stick within one island. If we're lucky, some of these Islands might die off before the problem becomes too bad.

Of course, these base components could effect the development of higher order layers of the computing stack and you are likely to see increasing divergence between these islands as you move up the stack. Hence, the platform space on the vCloud island will differ from the platform space on the EC2 / S3 island. We will see various efforts to provide common platforms across both, but each will tend towards the lowest common denominator between the islands and never fully exploit the potential of any. Such an approach will generally fail compared to platforms dedicated to that island, especially if each island consists of multiple providers hence overcoming those general outsourcing risks (lack of second sourcing options etc). Maybe we'll be lucky.

So, the future looks like multiple cloud islands, each consisting of many service providers complying to the standard of that island - either vCloud, EC2/S3 or whatever. Increasing divergence in higher order systems (platforms, applications) between the islands and whilst easy switching between providers on an island is straightforward, shifting between islands requires translation. This is not dissimilar to the linux vs windows worlds with applications and platforms tailored to each. The old style of division will just continue with a new set of dividing lines in the cloud. Is that a problem?

Yes, it's huge if you're a customer.

Whilst cloud provides more efficient resources, consumption will go through the roof due to effects such as componentisation, long tail of unmet business demand, co-evolution and increased innovation (Jevons' paradox). Invariably one of the islands will become more price efficient i.e. there is no tax to a technology vendor who collects their annual license and upgrade fee through a drip feed process. It's this increased dependency combined with price variance which will result in operational inefficiencies for one competitor when compared to another who has chosen the more efficient island. The problem for the inefficient competitor will be the translation costs of moving wholesale from one island to another. This is likely to make today's translations look trivial and in all probability will be prohibitive. The inefficient competitor will be forced therefore to compete on a continual disadvantage or attempt to drive the technology vendor to reduce their taxation on the market.

The choices being made today (many are choosing islands based upon existing investment and political choices) will have significant long term impacts and my come to haunt many companies.
It's for these reasons, that I've recommended to anyone getting involved in cloud to look for :-
  1. aggressively commoditised environments with a strong public ecosystem.
  2. signals that multiple providers will exist in the space.
  3. signals that providers in the space are focused on services and not bits.
  4. an open source reference implementation which provides a fully functioning and operating environment.
In my simple world, VMWare is over-engineered and focuses on resilient virtual machines rather than commodity provision. It's ideal for a virtual data centre but we're talking about computing utilities and it also suffers from being a proprietary stack. Many of the other providers offer "open" APIs but as a point of interest APIs can always be reverse engineered for interoperability reasons and hence there is no such thing as "closed" API.

The strongest and most viable island currently resides around EC2 / S3 with the various open source implementations (such as UEC), especially since the introduction of Rackspace & Nasa's service focused openstack effort.

I don't happen to agree with Simon Crosby that VMWare's latest cloud effort Redwood == Deadwood. I agree with his reasoning for why it should be, I agree that they're on shaky grounds in the longer term but unfortunately, I think many companies will go down the Redwood route for reasons of political capital and previous investment. IMHO I'm pretty sure they'll eventually regret that decision.

If you want my recommendation, then at the infrastructure layer get involved with open stack. At the platform layer, we're going to need the same sort of approach. I have high hopes for SSJS (having been part of Zimki all those years back), so something like Joyent's Smart platform would be in the right direction.

---  Added 19th August 2013

Gosh, this is depressing. 

Three years and 15 days later Ben Kepes (a decent chap) writes a post on how we're coming to terms with what are basically "islands in the clouds".

OpenStack followed a differentiation road (which James Duncan and I raised as a highly dubious play to the Rackspace Execs at the "OpenStack" party in July at OSCON 2010). They didn't listen and we didn't get the market of AWS clones. In all probability if AWS compatibility had been the focus back in 2010 then the entire market around OpenStack could have possibly been much larger than AWS by now. But, we will never know and today, OpenStack looks like it has almost given up the public race and is heading for a niche private role.

In his article, Ben states that companies never wanted "cloud bursting" - a term which seems to be a mix of 'live' migration (a highly dubious and somewhat fanciful goal to aim for which is more easily managed by other means) combined with the ability to expand a system into multiple environments.

Dropping the 'live' term, then both can be achieved easily enough with deployment and configuration management tools. One of the reasons why I became a big fan of Chef in '08/'09 (and not just because of my friend Jesse Robbins). This sort of approach is simple if you have multiple providers demonstrating semantic interoperability (i.e. providing the same API and the same behaviour) as your cost of re-tooling and management is small. It becomes unnecessarily more complex with more Islands.

Anyway, that aside the one comment I'll make on Ben's post is the goal was never "cloud bursting" but instead second sourcing options and balancing of buyer / supplier relationship. Other than that, a good but depressing post.