Friday, January 10, 2014

Cloud Recap + five and a bit years

tl;dr when faced with a system which claims to be PaaS then ask yourself - can I just write code, build data and consume services? If you need to do anything else other than this then it's not PaaS.

Five and a bit years ago I wrote a post called 'Cloud Recap', a summary of the previous years and the state of play in 2008. In this post, I'd like to revisit that earlier topic and also a concern I raised in 2009 on how 'Platform is being used at all layers of the stack, so we hear of cloud platforms for building IaaS and many other mangled uses of the concepts'

In the early days ('05-'07) back when we had systems like BungeeLabs, Zimki etc then we had a simpler division of the stack though somewhat misguided by the tendency to create too many layers. The key was always the division of responsibility i.e. with different layers of the stack, part of the solution became someone else's problem. Of course that created new problems related to second sourcing options and buyer / supplier relationship and hence portability was going to be an issue.

Since then the terms have changed, hardware is now infrastructure and frameworks are now platforms. The latter change is unfortunate because Framework as a Service clearly spelt out that you would be coding (and adding data, consuming services) in a framework whereas Platform as a Service could end up meaning a multitude of things - a coding platform, a deployment platform etc. This is what has happened and hence my concern back in '09.

Now to explain the difference and where I consider there to be a problem and divergence from the original path, I'm going to explain today's layers of the stack in terms of the old view (which I happen to think is cleaner). 

I'll first take a purist view. So, in reverse order, from figure 1

Figure 1 - An overview of the stack, old and new




The lowest layer of the stack is related to provision of virtual hardware either through virtual machines or containers such as LXC. Along with the evolution of activity (computing evolving from product to utility) then we expected to see a co-evolution of practice (as in architectural practice). We've seen this with the shift from scale-up to scale-out, N+1 to design for failure, disaster recovery to chaos engines etc. This has given rise to the 'DevOps' movement and hence there is a host of tools that have developed around configuration management, auto scaling, auto deployment, policies etc from Chef to Juju. This has unsurprisingly extended to concepts such as an application store with entire images, configuration and policy information bundled together. This low level of the stack is known as IaaS.

The second layer of the stack is related to simply the use of code, data and services. The underlying components including the management layers are de-coupled from the perspective of the user. The user is only concerned with writing of code, data and consumption of services. These sorts of concepts are encapsulated in systems like GAE, Heroku and Cloud Foundry. Obviously behind the scenes is a wealth of configuration management, auto-scaling, use of VMs / containers etc. This layer of the stack is known as PaaS.

The third layer of the stack (application) is related to provision of entire applications and application services e.g. salesforce. This remains relatively intact.

From a purist point of view, each of these layers would eventually be built on the other and each higher layer would provide increasing speeds of creation of higher order systems and agility at the cost of decreasing flexibility. Hence you would also get application stores of SaaS application built on PaaS along with PaaS built on IaaS.

Flexibility vs speed of higher order systems creation is the inevitable trade-off that componentisation creates (a good examination on this is provided by Herbert Simon).  Now key to this and the ideas of componentisation is minimisation of lower orders to good enough standards. So, for example in the IaaS space you would expect to see a limited range of virtual machine types (as you do with AWS). In the platform space, you would expect to see a limited range of coding environments (e.g. buildpacks).

However, there has been a divergence from this path and in particular with the idea of Application Containers. In these environments then applications are described in discrete virtual machines or containers along with configuration and policy information. Those application deployment environments and related app stores which manage these images are now unfortunately called PaaS.

Now, there's nothing wrong with application stores containing VM images or Containers or Cartridges for an application or an application component, offering autoscaling and configuration management with one click install and there have been many efforts to provide effective management of this (e.g. early CohesiveFT, JuJu, OpenShift Cartridges etc). But to encourage development of higher order systems on a plethora of different base components is foolhardy and likely to create a sprawling mess in the future.  Now, I'm not having a pop at Docker here because I happen to like Linux containers but I'm far from convinced that describing containers and the management of them as some form of development platform is wise.

You might have specific containers for specific languages and your code and data will be bound to it - which is all good.  But it's the limitation of choice which is key for effective componentisation.  Creating a platform on solely the idea of configuration of containers might give you freedom and flexibility but it would be the equivalent of enabling a multitude of perl compilers for different purposes. There is a significant long term cost in terms of management, portability and ultimately agility as more effort becomes focused on using the right container rather than simply coding. This is why I happen to like buildpacks and for the record, a limited number of buildpacks as the basis for a platform.

In the past, there was a very VERY hard line between what used to be called Frameworks and the underlying Hardware components. In a framework, I would write code, build data and consume services and that's all. If my framework had to ask me what environment I should use then it's not a framework. This is how the original Platform as a Service environments appeared to be designed, unfortunately the term has now morphed to include all sorts of things.

I raise this because of the Gartner MQ series on Enterprise Platforms which contains many that I would describe as true 'platforms' but also others which can be best described as deployment, configuration, autoscaling and image  / container management environments. The latter have a role to play, certainly for single click deploy of applications but if you intend to use these as development platforms then I would caution you to think carefully about sprawl before you end up having to build another system to cope with a mass of different base components.

Flexibility has a cost, it is not your friend. Such choice won't benefit development any more than being able to choose from a million different types of bricks would help housebuilding.

PaaS is all about writing code, building data and consuming services ... nothing else. It's the embodiment of what it known as 'NoOps' (a fairly awful and misleading term but then as an industry we're good at this - cloud etc).  This doesn't mean that there are literally no ops but instead ops is hidden behind the interface.  In a PaaS world, the developer doesn't care about VM's, containers, auto scaling and configuration between devices any more than they care about what hardware is used. The developer shouldn't even normally care about buildpacks just as long as one which will run their code exists on the platform. All the developer cares about is their code.

Naturally there is lots of misunderstanding and inertia around these concepts, that's normal but NoOps is going to happen whether people like it or not. Get used to it. Of course, that won't stop vendors marketing endless configuration, autoscaling and deployment systems as PaaS. Certainly, all these sub components are part of PaaS but they should be invisible.

So when faced with a system which claims to be PaaS then ask yourself - can I just write code, build data and consume services? If you need to do anything else other than this then it's not PaaS.