As always I’m not-so-secretly working on a side project of mine (although I’ve kept it’s true nature a secret from most) which utilizes Windows Azure as the underlying platform. I’ve been working on it for the past 3 months or so and whilst it isn’t my first Azure application it is the first one that I’ve actually put into production. That means I’ve had to deal with all the issues associated with doing that, from building an error reporting framework to making code changes that have no effect in development but fix critical issues when the application is deployed. I’ve also come to the realisation that some the architectural decisions I made, ones done with an eye cast towards future scalability, aren’t as sound as I first thought they were.
I’ve touched on some of the issues and considerations that Azure Tables has previously but what I haven’t dug into is the reasons you would choose to use. On the surface it looks like a stripped down version of a relational database, missing some features but making up for it by being an extremely cheap way of storing a whole lot of data. Figuring that my application was going to be huge some day (as all us developers do) I made the decision to use Azure Tables for everything. Sure querying the data was a little cumbersome but there were ways to code around that, and code around I did. The end solution does work as intended when deployed into production but there are some quirks which don’t sit well with me.
For starters querying data from Azure Tables on anything but the partition key and row key will force a table scan. Those familiar with NOSQL style databases will tell me that that’s the point, storage services like these are optimized for this situation and outside of that you’re better off using an old fashioned SQL database. I realised this when I was developing it however the situations I had in mind fit in well with with the partition/row key paradigm as often I’d need to get a whole partition, single record or (and this is the killer) the entire table itself. Whilst Azure Tables might be great at the first 2 things it’s absolutely rubbish at the latter and this causes me no end of issues.
In the beginning I, like most developers, simply developed something that worked. This included a couple calls along the lines of “get all the records in this table then do something with each of them”. This worked well up until I started getting hundreds of thousands of rows needing to be returned which often ended with the query being killed long before it could complete. Frustrated I implemented a solution that attempted to iterate over all records in the table by requesting all of the records and then following the continuation tokens as they were given to me. This kind of worked although anyone who’s worked with Azure and LINQ will tell you that I reinvented the wheel by forgoing the .AsTableServiceQuery() method which does that all for you. Indeed the end result was essentially the same and the only way around it was to put in some manual retry logic (in addition to the regular RetryPolicy). This works but retrieving/iterating over 800,000 records takes some 5 hours to complete, unacceptable when I can do the same thing on my home PC in a minute or two.
It’s not a limitation of the instances I’m using either as I’m using Azure SQL for one part of it which uses a subset of the data, but still the same number of records, is able to return in a fraction of the time. Indeed the issue seems to come from the fact that Azure Tables lacks the ability to iterate and re-runs the giant query every time I request a the next 1000 records. This often runs into the execution time limit which terminates all connections from my instance to the storage, causing a flurry of errors to occur. The solution seems clear though, I need to move off Azure Tables and onto Azure SQL.
Realistically I should’ve realised this a lot sooner as there are numerous queries I make on things other than the partition and row keys which are critical to the way my application functions. This comes with its own challenges as scaling out the application becomes a lot harder but honestly I’m kidding myself by thinking I’ll need that level of scalability any time soon, especially when I can simply move database tables around on Azure instances to get the required performance and once that’s not enough I’ll finally try to understand SQL Federations properly and that will sort it for good.
Windows Azure Tables are one of those newfangled NoSQL type databases that excels in storing giant swaths of structured data. For what they are they’re quite good as you can store very large amounts of data in there without having to pay through the nose like you would for a traditional SQL server or an Azure instance of SQL. However that advantage comes at a cost: querying the data on anything but the partition key (think of it as a partition of the data within a table) and the row key (the unique identifier within that partition) results in queries that take quite a while to run, especially when compared to its SQL counter parts. There are ways to get around this however no matter how well you structure your data eventually you’ll run up against this limitation and that’s where things start to get interesting.
By default whenever you do a large query against an Azure Table you’ll only get back 1000 records, even if the query will return more. However if your query did have more results than that you’ll be able to access them via a continuation token that you can add to your original query, telling Azure that you want the records past that point. For those of us coding on the native .NET platform we get the lovely benefit of having all of this handled for us directly by simply adding .AsTableServiceQuery() to the end of our LINQ statements (if that’s what you’re using) which will handle the continuation tokens for us. For most applications this is great as it means you don’t have to fiddle around with the rather annoying way of extracting those tokens out of the response headers.
Of course that leads you down the somewhat lazy path of not thinking about the kinds of queries you’re running against your Tables and this can lead to problems down the line. Since Azure is a shared service there are upper limits on how long queries can run and how much data they can return to you. These limits aren’t exactly set in stone and depending on how busy the particular server you’re querying is or the current network utilization at the time your query could either take an incredibly long time to return or could simply end up getting closed off. Anyone who’s developed for Azure in the past will know that this is pretty common, even for the more robust things like Azure SQL, but there’s one thing that I’ve noticed over the past couple weeks that I haven’t seen mentioned anywhere else.
As the above paragraphs might indicate I have a lot of queries that try and grab big chunks of data from Azure Tables and have, of course, coded in RetryPolicies so they’ll keep at it if they should fail. There’s one thing that all the policies in the world won’t protect you from however and that’s connections that are forcibly closed. I’ve had quite a few of these recently and I noticed that they appear to come in waves, rippling through all my threads causing unhandled exceptions and forcing them to restart themselves. I’ve done my best to optimize the queries since then and the errors have mostly subsided but it appears that should one long running query trigger Azure to force the connection closed all connections from that instance to the same Table storage will also be closed.
Depending on how your application is coded this might not be an issue however for mine, where the worker role has about 8 concurrent threads running at any one time all attempting to access the same Table Storage account, it means one long running query that gets terminated triggers a cascade of failures across the rest of threads. For the most part this was avoided by querying directly on row and partition keys however the larger queries had to be broken up using the continuation tokens and then the results concatenated in memory. This introduces another limit on particular queries (as storing large lists in memory isn’t particularly great) which you’ll have to architect your code around. It’s by no means an unsolvable problem however it was one that has forced me to rethink certain parts of my application which will probably need to be on Azure SQL rather than Azure Tables.
Like any cloud platform Azure is a great service which requires you to understand what its various services are good for and what they’re not. I initially set out to use Azure Tables for everything and have since found that it’s simply not appropriate for that, especially if you need to query on parameters that aren’t the row or partition keys. If you have connections being closed on you inexplicably be sure to check for any potentially long running queries on the same role as this post can attest they could very well be the source of what ales you.
I’m a stickler for avoiding rework where I can, opting instead to make the most of what I already have before I set out on trying to rework something. You’d think that’d lead me to create overly complicated systems that have multiple nuances and edge cases but since I know I hate reworking stuff I’ll go out of my way to make things right the first time, even if it costs me a bit more initially. For the most part this works well and even when it comes time to dump something and start over again much of my previous work will make it into the reworked product, albeit it in a different form.
I hit such a dilemma last weekend when I was working on my latest project. As long time readers will know I’m a pretty big fan of Microsoft’s Azure services and I decided to use them as the platform for my next endeavour. For the most part it’s been quite good, getting started with the development environment was painless and once I got familiar with the features and limitations of the Azure platform I was able to create the basic application in almost no time at all. Everything was going great until I started to hit some of the fundamental limitations of one of Azure services, namely the Table Storage.
For the uninitiated Azure Table Storage is like a database, but not in the traditional sense. It’s one of them new fan dangled NoSQL type databases, the essential difference being that this kind of database doesn’t have a fixed schema or layout of how the data is stored. Considering that having a fixed layout of how the data is stored is where a database draws many of its advantages from you’d wonder what doing away with it would do for you. What it does is allow for a much higher level of scalability than a traditional database does and thus NoSQL type databases power many large apps, including things like Facebook and Twitter. Figuring that the app might be big one day (and Microsoft’s rather ludicrous pricing for SQL Azure) I settled on using it as my main data store.
However whilst there’s a lot of good things about Azure Table Storage there’s one downside that really hurts it’s usability: it’s limited query engine. You see whilst you can query it with good old fashioned LINQ the query parameters it supports are rather limited. In fact they’re limited to single parameter matches or boolean equivalences which, whilst working for a lot of use cases, doesn’t cater towards user constructed queries quite well. Indeed in my application where someone could search for a single name but the object could contain up to 8 (some of them set, some of them not) meant that I had to construct the query on the fly for the user. No problem I hear you say, LINQKit’s Predicate Builder can build that for you! Well you’d be wrong unfortunately since the resulting LINQ statement confuses the poor Azure Storage Client and the query errors out.
So at this point I was faced with a difficult decision: manually crank out all the queries (which would end up being huge and ridiculously unmaintainable) whilst keeping my Table Storage back end or bite the bullet and move everything into SQL Azure. Whilst I knew that writing out the queries would be a one time only task (a very time consuming one) I couldn’t shake that feeling that doing that would just be the wrong thing to do in the long run, leaving me with an unmaintainable system that I’d curse constantly. I haven’t made the changes yet, that’s this weekend’s goal, but I know it’s not going to be as trouble free as I hope it will.
Sometimes you just have to swallow that bitter pill and it’s usually better to do it sooner rather than later. Azure Table Storage was perfect for me in the beginning but as my requirements evolved the reality of the situation became apparent and I’m stuck in the unfortunate position of having to do rework that I tried so hard to avoid. My project and I will be better for it but it’s always tough when you’ve tried everything you could in order to avoid it and came up empty.