Wednesday, February 25, 2015

What's wrong with my private cloud ...

These days, I tend not to get too involved in cloud (having retired from the industry in 2010 after six years in the field) and my focus is on improving situational awareness and competition  (in my view a much bigger problem) through techniques such as mapping.  I do occasionally stick my oar into the cloud due to some very elementary mistakes that appear. One of those has raised its head again, namely the cost advantages / disadvantages of private cloud.

First, private cloud was always a transitional play which would ultimately head to niche unless a functioning competitive market forms enabling a swing from centralised to decentralised. This functioning market doesn't exist at the infrastructure layer and so the trend to niche for private cloud is likely to accelerate. There's a whole bunch of issues related to the impact of ecosystems (in terms of innovation / customer focus and efficiency rates), performance and agility, effective DevOps, financial instruments (spot market, reserved), capacity planning / use, focus on service operation and potential for sprawl which can work out very negatively for private cloud but these are beyond the scope of this post. I simply want to focus on cost and to make my life easier - the cost of compute.

Here's the problem. Two competitors going head to head in a field, both with over £15Bn in annual revenue and both spending over £150 M p.a. on infrastructure (i.e. compute, hosting etc). Now, this IT component represents a good but not huge chunk of their annual revenue (1%).

One of the competitors was building a private cloud. It was quite proud of the fact that it reckoned it had achieved almost parity with AWS EC2 (taking in this case a comparison to a m3.medium) of around $800 per year. What made them happy is that the public Amazon cost is around $600 per year and so they weren't far off but they gained all the "advantages" of being a private cloud. This is actually a disaster and was caused by three basic mistakes (ignoring all the other factors listed above which make private cloud unattractive).

Mistake 1 - Externalities.

The first question I asked (as I always do) is what percentage of the cost was power? The response was the usual one that fills me with dread - "That comes from another budget". Ok, when building a private cloud you need to take into consideration all costs from power, building, people, cost of money etc etc. On average the hardware & software component tends to be around 20-25% of the cost. Power tends to be the lion share. So, they weren't operating at anywhere close to $800 per equivalent per year but instead closer to $3,000 per year. This on the face of it was a 5x differential but then there's - future pricing.

Mistake 2 - Future pricing

Many years ago I calculated how efficiently you could run a large scale cloud and from this guesstimated that AWS EC2 was running at over 80% margin. But how could this be? Isn't Amazon the great low margin, high volume business? The problem is constraint. 

AWS EC2 is growing rapidly and compute happens to be elastic i.e. the more we reduce the price, the more we consume. However, there's a constraint in that it takes time, money and resource to bring large scale data centres online. With such constraints it's relatively easy to reduce the price so much that demand exceeds your ability to supply which is the last thing you want. Hence you have to manage a gentle decline in price. In the case of AWS I guesstimated that they were focused on doubling total capacity each year.  Hence, they'd have to manage the decline in price to ensure demand kept within the bounds of their supply factoring in the natural reduction of underlying costs. There are numerous techniques that can also help i.e. increasing size of default instance etc but we won't get into that.

Though their price is currently $600 per year, I take the view that their costs are likely to be sub $100 which means that a lot of future price cuts are on the way.  The recent 'price wars' from Google seem more about Google trying to find where the price points / constraints of Amazon are rather than a fully fledged race to the bottom. All the competitors have to keep a watchful eye on demand and supply.

Let us assume however, that AWS is less efficient than I think and the best price they could achieve is $150. This suddenly creates a future 20x differential between the real cost of the private environment. However, it's no big shakes because even if the competitor was using all public cloud (i.e. data centre zero) which is unlikely then it simply means they're spending $7.5M compared to our $150M and whilst they might be $140M of saving this is peanuts to our revenue ($15Bn+) and the business that's at stake. It's not worth the risk.

This couldn't be more wrong.

Mistake 3 - Future Demand

Cloud computing simply represents the evolution of a world of products to a world of commodity and utility services. This process of "industrialisation" has repeated many times before in our past and has numerous known effects from co-evolution of practice (hence DevOps), rapid increases in higher order systems, new forms of data (hence our focus on big data), increases in efficiency, a punctuated equilibrium (exponential change), failure of companies stuck behind inertia barriers etc etc.

There's a couple of things worth noting. There exists of long tail of unmet business demand for IT related projects. Compute resources are price elastic. The provision of commodity (+utility) forms of an activity enable rapid development of often novel higher order systems (utility compute allows a growth in analytics etc) which in turn evolve and over time become industrialised themselves (assuming they're successful). All of this increases demand for the underlying component.

Here's the rub. In 2010 I could buy a million times more compute resource than in the 1980s but does that mean my IT budget for compute has reduced a million fold in size during that time? No. What happened is I did more stuff.

In the same way, Cloud is extremely unlikely to reduce IT expenditure on compute (bar a short blip in certain cases) because we will end up doing more stuff. Why? Because we have competitors and as soon as they start providing new capabilities then we have to co-opt / adapt etc.

So, let us take our 20x differential (which in all likelihood is much higher) and assume we're building our private cloud for 5yrs+ giving time for price differentials to become clear. Our competitor isn't going to reduce their IT infrastructure spending, they are likely to continue spending $150M p.a. but do vastly more stuff. However, in order to maintain parity with this given the differential then we're going to need to be spending closer to $3Bn p.a. In reality, we won't do this but we spend vastly more than we need and we will still just lose ground to competitors - they'll have better capabilities than we do.

You must keep in mind that some vendors are going to lose out rather drastically to this shift towards utility services. However, they can limit the damage if you put yourself in a position where you need to buy vastly more equipment (due to the price differential) just to keep up with your competitors. This is not being done for your benefit. When you're losing your lunch, sometimes you can recover by feasting on a few individuals. If that means playing to the inertia of would be meal times (FUD, security, loss of jobs etc) then all is fair in war and business. You must ask yourself whether this is the right course of action or am I being lined up as someone's meal ticket?

Now, I understand that there are big issues with heading towards data centre zero and migrating to public cloud often related to the legacy environments. This is a major source of inertia particularly as architectural practices have to change to cope with volume operations of good enough components (cloud) compared to past product based practices (e.g. the switch from scale up to scale out or N+1 to design for failure etc). We've had plenty of warning on this and there's all sorts of sensible ways for managing this from the "sweat and dump" of legacy to (in very niche cases) limited use of private. 

But our exit cost from this legacy will only grow over time as we add more data. We're likely to see a crunch on cloud skills as demand rockets and this isn't going to get easier. There is no magic "enterprise cloud" which can enable future parity with all the benefits of commodity and volume operations but provided with non-commodity products customised to us in order to make our legacy life easier. The economics just don't stack up. 

By now, you should be well on your way to data centre zero (several companies are likely to reach this in 2015). You should have been "sweating and dumping" those legacy (or what I prefer to call toxic) IT assets over the last four+ years. You should have very limited private capacity for those niche cases which you can't migrate unless you can achieve future pricing parity with EC2 (be honest here, are you including all the costs? What is your cost comparison really?). You should have been talking with regulators to solve any edge cases (where they actually exist). You should be well on the way to public adoption.

If you're not then just hope your competitors return the favour. Be warned, don't just believe what they say but try and investigate. I know one CIO who has spent many years telling conferences why their industry couldn't use public cloud whilst all the time moving their company to public cloud. 

Some companies are going to be sorely bitten by these three mistakes of externalities, future pricing and future demand. Make sure it isn't you.

-- Additional note

Be very wary of the hybrid cloud promise and if you're going down that route make sure it translates to "a lot public with a tiny bit of private". This is a trade off but you don't want to get on the wrong side and over invest in private. There's an awful lot of vendors out there encouraging you to do what is actually not in your best interest by playing to inertia (cost of acquiring new practices, cost of change, existing political capital etc). There's some very questionable "consultant" CIOs / advisors giving advice based upon not a great deal and dubious beliefs.

Try and find someone who knows what they're talking about and has experience of doing this at reasonable scale e.g. Bob Harris, former CTO for Channel 4 or Adrian Cockroft, former Director Engineering Netflix. These people exist, go get some help if you need it and be very cautious about the whole private cloud space.

-- Additional note

Carlo Daffara has noted that in some cases, the TCO of private cloud (for very large, well run environments) can achieve 1/5th of AMZN public pricing. Now, there are numerous ecosystem effects around Amazon along with all sorts of issues regarding variability, security of power, financial instruments (reserved instances etc) and many of those listed above which create a significant disadvantage to private cloud. But in terms of pure pricing (only part of the puzzle) then a 1/5th AMZN public pricing is just on the borderline of making sense for the immediate future. Anything above this and then you're into dangerously risky ground.