If you’ve ever worked in a multi-tenant environment with shared resources you’ll know of the many pains that can come along with it. Resource sharing always ends up leading to contention and some of the time this will mean that you won’t be able to get access to the resources you want. For cloud services this is par for the course as since you’re always accessing shared services and so any application you build on these kinds of platforms has to take this into consideration lets your application spend an eternity crashing from random connection drop outs. Thankfully Microsoft has provided a few frameworks which will handle these situations for you, especially in the case of Azure SQL.
The Transient Fault Handling Application Block (or Topaz, which is a lot better in my view) gives you access to a number of classes which take out a lot of the pain when dealing with the transient errors you get when using Azure services. Of those the most useful one I’ve found is the RetryPolicy which when instantiated as SqlAzureTransientErrorDetectionStrategy allows you to simply wrap your database transactions with a little bit of code in order to make them resistant to the pitfalls of Microsoft’s cloud SQL service. For the most part it works well as prior to using it I’d get literally hundreds of unhandled exception messages per day. It doesn’t catch everything however so you will still need to handle some connection errors but it does a good job of eliminating the majority of them.
Currently however there’s no native support for it in Entity Framework (Microsoft’s data persistence framework) and this means you have to do a little wrangling in order to get it to work. This StackOverflow question outlines the problem and there’s a couple solutions on there which all work however I went for the simple route of instantiating a RetryPolicy and then just wrapping all my queries with ExecuteAction. As far as I could tell this all works fine and is the supported way of using EF with Topaz at least until 1.6 comes out which will have in built support for connection resiliency.
However when using Topaz in this way it seems that it mucks with entity tracking, causing returned objects to not be tracked in the normal way. I discovered this after I noticed many records not getting updated even though manually working through the data showed that they should be showing different values. As far as I can tell if you wrap an EF query with a RetryPolicy the entity ends up not being tracked and you will need to .Attach() to it prior to making any changes. If you’ve used EF before then you’ll see why this is strange as you usually don’t have to do that unless you’ve deliberately detached the entity or recreated the context. So as far as I can see there must be something in Topaz that causes it to become detached requiring you to reattach it if you want to persist your changes using Context.SaveChanges().
I haven’t tested any of the other methods of using Topaz with EF so it’s entirely possible there’s a way to get the entity tracked properly without having to attach to it after performing the query. Whether they work or not will be an exercise left for the reader as I’m not particularly interested in testing it, at least not just after I got it all working again. By the looks of it though a RC version of EF 6 might not be too far away, so this issue probably won’t remain one for long.
Windows Azure Tables are one of those newfangled NoSQL type databases that excels in storing giant swaths of structured data. For what they are they’re quite good as you can store very large amounts of data in there without having to pay through the nose like you would for a traditional SQL server or an Azure instance of SQL. However that advantage comes at a cost: querying the data on anything but the partition key (think of it as a partition of the data within a table) and the row key (the unique identifier within that partition) results in queries that take quite a while to run, especially when compared to its SQL counter parts. There are ways to get around this however no matter how well you structure your data eventually you’ll run up against this limitation and that’s where things start to get interesting.
By default whenever you do a large query against an Azure Table you’ll only get back 1000 records, even if the query will return more. However if your query did have more results than that you’ll be able to access them via a continuation token that you can add to your original query, telling Azure that you want the records past that point. For those of us coding on the native .NET platform we get the lovely benefit of having all of this handled for us directly by simply adding .AsTableServiceQuery() to the end of our LINQ statements (if that’s what you’re using) which will handle the continuation tokens for us. For most applications this is great as it means you don’t have to fiddle around with the rather annoying way of extracting those tokens out of the response headers.
Of course that leads you down the somewhat lazy path of not thinking about the kinds of queries you’re running against your Tables and this can lead to problems down the line. Since Azure is a shared service there are upper limits on how long queries can run and how much data they can return to you. These limits aren’t exactly set in stone and depending on how busy the particular server you’re querying is or the current network utilization at the time your query could either take an incredibly long time to return or could simply end up getting closed off. Anyone who’s developed for Azure in the past will know that this is pretty common, even for the more robust things like Azure SQL, but there’s one thing that I’ve noticed over the past couple weeks that I haven’t seen mentioned anywhere else.
As the above paragraphs might indicate I have a lot of queries that try and grab big chunks of data from Azure Tables and have, of course, coded in RetryPolicies so they’ll keep at it if they should fail. There’s one thing that all the policies in the world won’t protect you from however and that’s connections that are forcibly closed. I’ve had quite a few of these recently and I noticed that they appear to come in waves, rippling through all my threads causing unhandled exceptions and forcing them to restart themselves. I’ve done my best to optimize the queries since then and the errors have mostly subsided but it appears that should one long running query trigger Azure to force the connection closed all connections from that instance to the same Table storage will also be closed.
Depending on how your application is coded this might not be an issue however for mine, where the worker role has about 8 concurrent threads running at any one time all attempting to access the same Table Storage account, it means one long running query that gets terminated triggers a cascade of failures across the rest of threads. For the most part this was avoided by querying directly on row and partition keys however the larger queries had to be broken up using the continuation tokens and then the results concatenated in memory. This introduces another limit on particular queries (as storing large lists in memory isn’t particularly great) which you’ll have to architect your code around. It’s by no means an unsolvable problem however it was one that has forced me to rethink certain parts of my application which will probably need to be on Azure SQL rather than Azure Tables.
Like any cloud platform Azure is a great service which requires you to understand what its various services are good for and what they’re not. I initially set out to use Azure Tables for everything and have since found that it’s simply not appropriate for that, especially if you need to query on parameters that aren’t the row or partition keys. If you have connections being closed on you inexplicably be sure to check for any potentially long running queries on the same role as this post can attest they could very well be the source of what ales you.
I’m a stickler for avoiding rework where I can, opting instead to make the most of what I already have before I set out on trying to rework something. You’d think that’d lead me to create overly complicated systems that have multiple nuances and edge cases but since I know I hate reworking stuff I’ll go out of my way to make things right the first time, even if it costs me a bit more initially. For the most part this works well and even when it comes time to dump something and start over again much of my previous work will make it into the reworked product, albeit it in a different form.
I hit such a dilemma last weekend when I was working on my latest project. As long time readers will know I’m a pretty big fan of Microsoft’s Azure services and I decided to use them as the platform for my next endeavour. For the most part it’s been quite good, getting started with the development environment was painless and once I got familiar with the features and limitations of the Azure platform I was able to create the basic application in almost no time at all. Everything was going great until I started to hit some of the fundamental limitations of one of Azure services, namely the Table Storage.
For the uninitiated Azure Table Storage is like a database, but not in the traditional sense. It’s one of them new fan dangled NoSQL type databases, the essential difference being that this kind of database doesn’t have a fixed schema or layout of how the data is stored. Considering that having a fixed layout of how the data is stored is where a database draws many of its advantages from you’d wonder what doing away with it would do for you. What it does is allow for a much higher level of scalability than a traditional database does and thus NoSQL type databases power many large apps, including things like Facebook and Twitter. Figuring that the app might be big one day (and Microsoft’s rather ludicrous pricing for SQL Azure) I settled on using it as my main data store.
However whilst there’s a lot of good things about Azure Table Storage there’s one downside that really hurts it’s usability: it’s limited query engine. You see whilst you can query it with good old fashioned LINQ the query parameters it supports are rather limited. In fact they’re limited to single parameter matches or boolean equivalences which, whilst working for a lot of use cases, doesn’t cater towards user constructed queries quite well. Indeed in my application where someone could search for a single name but the object could contain up to 8 (some of them set, some of them not) meant that I had to construct the query on the fly for the user. No problem I hear you say, LINQKit’s Predicate Builder can build that for you! Well you’d be wrong unfortunately since the resulting LINQ statement confuses the poor Azure Storage Client and the query errors out.
So at this point I was faced with a difficult decision: manually crank out all the queries (which would end up being huge and ridiculously unmaintainable) whilst keeping my Table Storage back end or bite the bullet and move everything into SQL Azure. Whilst I knew that writing out the queries would be a one time only task (a very time consuming one) I couldn’t shake that feeling that doing that would just be the wrong thing to do in the long run, leaving me with an unmaintainable system that I’d curse constantly. I haven’t made the changes yet, that’s this weekend’s goal, but I know it’s not going to be as trouble free as I hope it will.
Sometimes you just have to swallow that bitter pill and it’s usually better to do it sooner rather than later. Azure Table Storage was perfect for me in the beginning but as my requirements evolved the reality of the situation became apparent and I’m stuck in the unfortunate position of having to do rework that I tried so hard to avoid. My project and I will be better for it but it’s always tough when you’ve tried everything you could in order to avoid it and came up empty.
Categories: Technology azure, cloud computing, linq, linqkit, microsoft, predicate builder, query, sql, table storage, troubles, woes