Windows Azure Worker Role Storage

Over-Thinking Scalability (or You Probably Don’t Need NOSQL).

As always I’m not-so-secretly working on a side project of mine (although I’ve kept it’s true nature a secret from most) which utilizes Windows Azure as the underlying platform. I’ve been working on it for the past 3 months or so and whilst it isn’t my first Azure application it is the first one that I’ve actually put into production. That means I’ve had to deal with all the issues associated with doing that, from building an error reporting framework to making code changes that have no effect in development but fix critical issues when the application is deployed. I’ve also come to the realisation that some the architectural decisions I made, ones done with an eye cast towards future scalability, aren’t as sound as I first thought they were.

Windows Azure Worker Role Storage

I’ve touched on some of the issues and considerations that Azure Tables has previously but what I haven’t dug into is the reasons you would choose to use. On the surface it looks like a stripped down version of a relational database, missing some features but making up for it by being an extremely cheap way of storing  a whole lot of data. Figuring that my application was going to be huge some day (as all us developers do) I made the decision to use Azure Tables for everything. Sure querying the data was a little cumbersome but there were ways to code around that, and code around I did. The end solution does work as intended when deployed into production but there are some quirks which don’t sit well with me.

For starters querying data from Azure Tables on anything but the partition key and row key will force a table scan. Those familiar with NOSQL style databases will tell me that that’s the point, storage services like these are optimized for this situation and outside of that you’re better off using an old fashioned SQL database. I realised this when I was developing it however the situations I had in mind fit in well with with the partition/row key paradigm as often I’d need to get a whole partition, single record or (and this is the killer) the entire table itself. Whilst Azure Tables might be great at the first 2 things it’s absolutely rubbish at the latter and this causes me no end of issues.

In the beginning I, like most developers, simply developed something that worked. This included a couple calls along the lines of “get all the records in this table then do something with each of them”. This worked well up until I started getting hundreds of thousands of rows needing to be returned which often ended with the query being killed long before it could complete. Frustrated I implemented a solution that attempted to iterate over  all records in the table by requesting all of the records and then following the continuation tokens as they were given to me. This kind of worked although anyone who’s worked with Azure and LINQ will tell you that I reinvented the wheel by forgoing the .AsTableServiceQuery() method which does that all for you. Indeed the end result was essentially the same and the only way around it was to put in some manual retry logic (in addition to the regular RetryPolicy). This works but retrieving/iterating over 800,000 records takes some 5 hours to complete, unacceptable when I can do the same thing on my home PC in a minute or two.

It’s not a limitation of the instances I’m using either as I’m using Azure SQL for one part of it which uses a subset of the data, but still the same number of records, is able to return in a fraction of the time. Indeed the issue seems to come from the fact that Azure Tables lacks the ability to iterate and re-runs the giant query every time I request a the next 1000 records. This often runs into the execution time limit which terminates all connections from my instance to the storage, causing a flurry of errors to occur. The solution seems clear though, I need to move off Azure Tables and onto Azure SQL.

Realistically I should’ve realised this a lot sooner as there are numerous queries I make on things other than the partition and row keys which are critical to the way my application functions. This comes with its own challenges as scaling out the application becomes a lot harder but honestly I’m kidding myself by thinking I’ll need that level of scalability any time soon, especially when I can simply move database tables around on Azure instances to get the required performance and once that’s not enough I’ll finally try to understand SQL Federations properly and that will sort it for good.

Leave a Reply