As always I’m not-so-secretly working on a side project of mine (although I’ve kept it’s true nature a secret from most) which utilizes Windows Azure as the underlying platform. I’ve been working on it for the past 3 months or so and whilst it isn’t my first Azure application it is the first one that I’ve actually put into production. That means I’ve had to deal with all the issues associated with doing that, from building an error reporting framework to making code changes that have no effect in development but fix critical issues when the application is deployed. I’ve also come to the realisation that some the architectural decisions I made, ones done with an eye cast towards future scalability, aren’t as sound as I first thought they were.
I’ve touched on some of the issues and considerations that Azure Tables has previously but what I haven’t dug into is the reasons you would choose to use. On the surface it looks like a stripped down version of a relational database, missing some features but making up for it by being an extremely cheap way of storing a whole lot of data. Figuring that my application was going to be huge some day (as all us developers do) I made the decision to use Azure Tables for everything. Sure querying the data was a little cumbersome but there were ways to code around that, and code around I did. The end solution does work as intended when deployed into production but there are some quirks which don’t sit well with me.
For starters querying data from Azure Tables on anything but the partition key and row key will force a table scan. Those familiar with NOSQL style databases will tell me that that’s the point, storage services like these are optimized for this situation and outside of that you’re better off using an old fashioned SQL database. I realised this when I was developing it however the situations I had in mind fit in well with with the partition/row key paradigm as often I’d need to get a whole partition, single record or (and this is the killer) the entire table itself. Whilst Azure Tables might be great at the first 2 things it’s absolutely rubbish at the latter and this causes me no end of issues.
In the beginning I, like most developers, simply developed something that worked. This included a couple calls along the lines of “get all the records in this table then do something with each of them”. This worked well up until I started getting hundreds of thousands of rows needing to be returned which often ended with the query being killed long before it could complete. Frustrated I implemented a solution that attempted to iterate over all records in the table by requesting all of the records and then following the continuation tokens as they were given to me. This kind of worked although anyone who’s worked with Azure and LINQ will tell you that I reinvented the wheel by forgoing the .AsTableServiceQuery() method which does that all for you. Indeed the end result was essentially the same and the only way around it was to put in some manual retry logic (in addition to the regular RetryPolicy). This works but retrieving/iterating over 800,000 records takes some 5 hours to complete, unacceptable when I can do the same thing on my home PC in a minute or two.
It’s not a limitation of the instances I’m using either as I’m using Azure SQL for one part of it which uses a subset of the data, but still the same number of records, is able to return in a fraction of the time. Indeed the issue seems to come from the fact that Azure Tables lacks the ability to iterate and re-runs the giant query every time I request a the next 1000 records. This often runs into the execution time limit which terminates all connections from my instance to the storage, causing a flurry of errors to occur. The solution seems clear though, I need to move off Azure Tables and onto Azure SQL.
Realistically I should’ve realised this a lot sooner as there are numerous queries I make on things other than the partition and row keys which are critical to the way my application functions. This comes with its own challenges as scaling out the application becomes a lot harder but honestly I’m kidding myself by thinking I’ll need that level of scalability any time soon, especially when I can simply move database tables around on Azure instances to get the required performance and once that’s not enough I’ll finally try to understand SQL Federations properly and that will sort it for good.
Maybe it’s my corporate IT roots but I’ve always thought that the best cloud strategy would be a combination of in house resources that would have the ability to offload elsewhere when extra resources were required. Such a deployment would mean that organisations could design their systems around base loads and have the peak handled by public clouds, saving them quite a bit of cash whilst still delivering services at an acceptable level. It would also gel well with management types as not many are completely comfortable being totally reliant on a single provider for any particular service which in light of recent cloud outages is quite prudent. For someone like myself I was more interested in setting up a few Azure instances so I could test my code against the real thing rather than the emulator that comes with Visual Studio as I’ve always found there’s certain gotchas that don’t show up until you’re running on a real instance.
Now the major cloud providers: Rackspace, AWS, et. al. haven’t really expressed much interest in supporting configurations like this which makes business sense for them since doing so would more than likely eat into their sales targets. They could license the technology of course but that brings with it a whole bunch of other problems like what are supported configurations and releasing some measure of control over the platform in order to enable end users to be able to deploy their own nodes. However I had long thought Microsoft, who has a long history of letting users install stuff on their own hardware, would eventually allow Azure to run in some scaled down fashion to facilitate this hybrid cloud idea.
Indeed many developments in their Azure product seemed to support this, the strongest of which being the VM role which allowed you to build your own virtual machine then run it on their cloud. Microsoft have offered their Azure Appliance product for a while as well, allowing large scale companies and providers the opportunity to run Azure on their own premises. Taking this all into consideration you’d think that Microsoft wasn’t too far away from offering a solution for medium organisations and developers that were seeking to go to the Azure platform but also wanted to maintain some form of control over their infrastructure.
After talking with a TechEd bound mate of mine however, it seems that idea is off the table.
VMware has had their hybrid cloud product (vCloud) available for quite some time and whilst it satisfies most of the things I’ve been talking about so far it doesn’t have the sexy cloud features like an in-built scalable NoSQL database or binary object storage. Since Microsoft had their Azure product I had assumed they weren’t interested in competing with VMware on the same level but after seeing one of the TechEd classes and subsequently browsing their cloud site it looks like they’re launching SCVMM 2012 as a direct competitor to vCloud. This means that Microsoft is basically taking the same route by letting you build your own private cloud, which is basically just a large pool of shared resources, foregoing any implementation of the features that make Azure so gosh darn sexy.
Figuring that out left me a little disappointed, but I can understand why they’re doing it.
Azure, as great as I think it is, probably doesn’t make sense in a deployment scenario of anything less than a couple hundred nodes. Much of Azure’s power, like any cloud provider, comes from its large number of distributed nodes which provide redundancy, flexibility and high performance. The Hyper-V based private cloud then is more tailored to the lower end where enterprises likely want more control that what Azure would provide, not to mention that experience in deploying Azure instances is limited to Microsoft employees and precious few from the likes of Dell, Fujitsu and HP. Hyper-V then is the better solution for those looking to deploy a private cloud and should they want to burst out to a public cloud they’ll just have to code their application to be able to do that. Such a feature isn’t impossible however, but it is an additional cost that will need to be considered.
I’m a really big fan of Microsoft’s development tools. No other IDE that I’ve used to date can hold a candle to the mighty Visual Studio, especially when you couple it with things like ReSharper and the massive online communities dedicated to overcoming any of the shortcomings that you might encounter along the way. The same communities are also responsible for developing many additional frameworks in order to extend the Microsoft platforms even further, with many of them making their way into official SDKs. There have only been a few times when I’ve found myself treading new ground with Microsoft tools which no one has before, but every time I have I’ve discovered so much more than I initially set out to.
I’ve come to call these encounters “black magic moments”.
You see with the ease of developing with a large range of solutions already laid out for you it becomes quite tempting to slip into the habit of seeking out a completed solution, rather than building one of your own. Indeed there were a few design decisions in my previous applications that were driven by this, mostly because I didn’t want to dive under the hood of those solutions to develop the fix for my particular problem. It’s quite surprising how far you can get into developing something by doing this but eventually the decisions you make will corner you into a place where you have to make a choice between doing some real development or scraping a ton of work. Microsoft’s development ideals seem to encourage the latter (in favor of using one of their tried and true solutions) but stubborn engineers like me hate having to do rework.
This of course means diving beneath the surface of Microsoft’s black boxes and poking around to get an idea of what the hell is going on. My first real attempt at this was back in the early days of the Lobaco code base when I had decided that everything should be done via JSON. Everything was working out quite well until I started trying to POST a JSON object to my webservice, where upon it would throw out all sorts of errors about not being able to de-serialize the object. I spent the better part of 2 days trying to figure that problem out and got precisely no where, eventually posting my frustrations to the Silverlight forums. Whilst I didn’t get the actual answer from there they did eventually lead me down a path that got me there, but the solution is not documented anywhere nor does it seem that anyone else has attempted such a feat before (or after for that matter).
I hit another Microsoft black magic moment when I was working on my latest project that I had decided would be entirely cloud based. After learning my way around the ins and outs of the Windows Azure platform I took it upon myself to migrate the default authentication system built into ASP.NET MVC 3 onto Microsoft’s cloud. Thanks to a couple handy tutorials the process of doing so seemed fairly easy so I set about my task, converting everything into the cloud. However upon attempting to use the thing I just created I was greeted with all sorts of random errors and no amount of massaging the code would set it straight. After the longest time I found that it came down to a nuance of the Azure Tables storage part of Windows Azure, namely the way it structures data.
In essence Azure Tables is one of them new fangled NOSQL type databases and as such it relies on a couple properties in your object class to uniquely identify a row and provide scalability. These two properties are called PartitionKey and RowKey and whilst you can leave them alone and your app will still work it won’t be able to leverage any of the cloud goodness. So in my implementation I had overridden these variables in order to get the scalability that I wanted but had neglected to include any setters for them. This didn’t seem to be a problem when storing objects in Azure Tables but when querying them it seems that Azure requires the setters to be there, even if they do nothing at all. Adding one in fixed nearly every problem I was encountering and brought me back to another problem I had faced in the past (more on that when I finally fix it!).
Like any mature framework that does a lot of the heavy lifting for you Microsoft’s solutions suffer when you start to tread unknown territory. Realistically though this is should be expected and I’ve found I spend the vast majority of my time on less than 20% of the code that ends up making the final solution. The upshot is of course that once these barriers are down progress accelerates at an extremely rapid pace, as I saw with both the Silverlight and iPhone clients for Lobaco. My cloud authentication services are nearly ready for prime time and since I struggled so much with this I’ll be open sourcing my solution so that others can benefit from the numerous hours I spent on this problem. It will be my first ever attempt at open sourcing something that I created and the prospect both thrills and scares me, but I’m looking forward to giving back a little to the communities that have given me so much.