Monday, April 12, 2010

Use cloud and get rid of your sysadmin.

Following on from my Cloud Computing Myths post.

The principle argument behind cloud getting rid of sysadmins is one of "pre-cloud a sysadmin can manage a few hundred machines, in the cloud era with automation a sysadmin can manage tens of thousands of virtual machines". In short, since system admins will be able to manage a two orders of magnitude greater number of virtual machines then we will need less of them.

Let's be first clear what automation means. At the infrastructure layer of the computing stack there are a range of systems, commonly known as orchestration tools, which allow for basic management of a cloud, automatic deployment of virtual infrastructure, configuration management, self-healing, monitoring, auto-scaling and so forth. These tools take advantage of the fact that in the cloud era, infrastructure is code and is created, modified and destroyed through APIs.

Rather than attempting to create specialised infrastructure, the cloud world takes advantage of a bountiful supply of virtual machines provided as standardised components. Hence scaling is achieved not through provision of an ever more powerful machine but deployment of vastly more standardised virtual machines.

Furthermore the concept of a machine also changes. We're moving away from the idea of a virtual machine image for this or that, to one of a basic machine image and all the run time information you require to configure it. The same base image will become a wiki, a web server or part of a n-tier system.

All of these capabilities allow for more ephemeral infrastructure, rapidly changing according to need with rapid deployment and destruction. This creates a range of management problems and hence we have the growth of interest in orchestration tools. These tools vary from specifically focused components to more general solutions and include chef, controltier,CohesiveFT, capistrano, rightscale, scalr and the list goes on.

A favourite example of mine, simply because it acts as a pointer towards the future, is PoolParty. Using a simple syntax of describing infrastructure deployment, PoolParty synthesises the core concepts of this infrastructure change. For example, deploying a system no longer becomes a long architectural review and planning process, an RT ticket requesting some new servers with an inevitable wait, the installation, racking and configuration of those servers followed with change control meetings.

Deploying a system becomes in principle as simple as :-

Pool "my_application" do
Cloud "my_application_server" do
Using EC2
Instance 1...1
Image_id "xxxxx"
Autoscale
end

Cloud "my_database_server" do
Using EC2
Instances 1...1
Image_id "xxxxx"
end

end

It is these concepts of infrastructure as code and automation through orchestration tools when combined with a future of computing resources provided as larger components (pre-built racks and containers) which have led many to assume that cloud will remove the roles of many sysadmins. This is a weak assumption.

A historical review of computing resource usage shows it's price elastic. In short, as the cost for provision of a unit of compute resource reduces then the demand has increased leading to today's proliferation of computing.

Now, depending upon who you talk to, the inefficiency of computer resources in your average data centre runs at 80-90%. Adoption of private clouds should (ignoring the benefits of using commodity hardware) provide a 5 x reduction in price per unit. Based upon historical precedents, you could expect this to be much higher in public cloud and lead to a 10-15x increase in consumption as we find the long tail of applications that companies desire becomes ever more feasible.

Of course, this ignores transient applications (those with a short life time such as weeks, days or hours), componentisation (e.g. self service and use of infrastructure as a base component), co-evolution effects and the larger economies of scale potentially available on public providers.

Given Moore's law, the current level of wastage, a standard VM / Physical server conversion rate, greater efficiencies in public provision, increasing use of commodity hardware and the assumption that expenditure of computing resources will remain flat (any reductions in cost per unit being compensated by increase in workload) then it is entirely feasible that within 5-7 years these effects could lead to a 100x increase in virtual infrastructure (i.e. number of virtual servers compared to current physical servers). It's more than possible that in five years time every large marketing department will have its own 1,000 node hadoop cluster for data processing of consumer behaviour.

So, we come back to the original argument which is "pre-cloud a sysadmin can manage a few hundred machines, in the cloud era with automation a sysadmin can manage tens of thousands of virtual machines". The problem with this argument is that if cloud develops as expected then each company will be managing two orders of magnitude more virtual machines which means there'll be at least as many sysadmins as there are today.

Now whilst the model changes when it comes to platform and software as a service (and there are complications here which I'll leave to another day), the assumption that cloud will lead to less system adminstrators is another one of those cloud myths which hasn't been properly thought through.

P.S. The nature of the role of a sysadmin will change and their skillsets will broaden, however if you're planning to use cloud to reduce their numbers then you might be in for a nasty shock.

P.P.S. Just to clarify, I've been asked by a company which runs 2,000 physical servers whether this means that in 5-7 years they could be running 200,000 virtual servers (some of which will be provided by private and most on public clouds, ideally through an exchange or brokers). This is exactly what I mean. You're going to need orchestration tools just to cope and you'll need sysadmins to be skilled in these and managing a much more complex environment.