Working with Microsoft’s cloud over the past couple months has been a real eye opener. Whilst I used to scoff at all these people eschewing the norms that have (and continue to) serve us well in favor of the latest technology du’jour I’m starting to see the benefits of their ways, especially with the wealth of resources that Microsoft has on the subjects. Indeed the cloud aspects of my latest side project, whilst consuming a good chunk of time at the start, have required almost no tweaking whatsoever even after I change my data model or fundamental part of how the service works. There is one architectural issue that continues to bug me however and recent events have highlighted why it troubles me so.
The events I’m referring to are the recent outage to Amazon’s Elastic Block storage service that affected a great number of web services. In essence part of the cloud services that Amazon provides, in this case a cloud disk service that for all intents and purposes is the same as a hard drive in your computer, suffered a major outage in one of their availability zones. This meant for most users in that particular zone that relied on this service to store data their services would begin to fail just as your computer would if I ripped its hard drive out whilst you were using it. The cause of the events can be traced back to human error but it was significantly compounded by the high level of automation in the system, which would be needed for any kind of cloud service at this scale.
For a lot of the bigger users of Amazon’s cloud this wasn’t so much of an issue since they usually have replicas of their service on the geographically independent mirror of the cloud that Amazon runs. For a lot of users however their entire service is hosted in the single location, usually because they can’t afford the additional investment to geographically disperse their services. Additionally you have absolutely no control over any of the infrastructure so you leave yourself at the mercy of technicians of the cloud. Granted they’ve done a pretty good so far but you’re still outsourcing risk that you can’t mitigate, or at least not affordably.
My preferred way of doing the cloud, which I’ve liked ever since I started talking to VMware about their cloud offerings back in 2008, was to combine the ideas of both self hosted services with the extensibility of the cloud. For many services they don’t need all the power (nor the cost) of running multiple cloud instances and could function quite happily on a few co-hosted servers. Of course there are times when they’d require the extra power to service peak requests and that’s where the on-demand nature of cloud services would really shine. However apart from vCloud Express (which is barely getting trialled by the looks of things) none of the cloud operators give you the ability to host a small private cloud yourself and then offload to their big cloud as you see fit. Which is a shame since I think it could be one way cloud providers could wriggle their way into some of the big enterprise markets that have shunned them thus far.
Of course there are other ways of getting around the problems that cloud providers might suffer, most of which involve not using them for certain parts of your application. You could also build your app on multiple cloud platforms (there are some that even have compatible APIs now!) but that would add an inordinate amount of complexity to your solution, not to mention doubling the costs of developing it. The hybrid cloud solution feels like the best of both worlds however it’s highly unlikely that it’ll become a mainstream solution anytime soon. I’ve heard rumors of Microsoft providing something along those lines and their new VM Role offering certainly shows the beginnings of that becoming a reality but I’m not holding my breath. Instead I’ll code a working solution first and worry about scale problems when I get to scale, otherwise I’m just engaging in fancy procrastination.