Since my side projects (including this blog) don’t really have any kind of revenue generation potential I tend to shy away from spending a lot on them, if I can avoid it. This blog is probably the most extravagant of the lot getting its own dedicated server which, I’ll admit, is overkill but I’d had such bad experiences which shared providers before that I’m willing to bear the cost. Cloud hosting on the other hand can get nightmarishly expensive if you don’t keep an eye on it and that was the exact reason I shied away from it for any of my side projects. That was until I got accepted into the Microsoft BizSpark program which came with a decent amount of free usage, enough for me to consider it for my next application.
The Azure benefits for BizSpark are quite decent with a smattering of all their offerings chucked in which would easily be enough to power a nascent start up’s site through the initial idea verification stage. That’s exactly what I’ve been using it for and, as longtime readers will tell you, my experiences have been fairly positive with most of the issues arising from my misappropriation of different technologies. The limits, as I found out recently, are hard and running up against them causes all sorts of undesirable behaviour, especially if you run up against your compute or storage limit. I managed to run up against the former due to a misunderstanding of how a preview technology was billed but I hadn’t hit the latter until last week.
So the BizSpark benefits are pretty generous for SQL storage, giving you access to a couple 5GB databases (or a larger number of smaller 1GB ones) gratis. That sounds like a lot, and indeed it should be sufficient for pretty much any burgeoning application, however mine is based around gathering data from another site and then performing some analytics on it so the amount of data I have is actually quite large. In the beginning this wasn’t much of a problem as I had a lot of headroom however after I made a lot of performance improvements I started gathering data at a much faster rate and the 5GB limit loomed over me. In the space of a couple weeks I managed to fill it completely and had to shut it down lest my inbox get filled with “Database has reached its quota” errors.
Looking over the database in the Azure management studio (strangely one of the few parts of the Azure that still uses Silverlight) showed that one particular table was consuming the majority of the database. Taking a quick look at the rows it was pretty obvious as to why this was the case, I had a couple columns that had lengthy URLs in them and over the 6 million or so records I had this amounted to a huge amount of space being used. No worries I thought, SQL has to have some kind of built in compression to deal with this and so off I went looking for an easy solution.
As it turns out SQL Server does and its implementation would’ve provided the benefits I was looking for without much work on my end. However Azure SQL doesn’t support it and the current solution to this is to implement row based compression inside your application. If you’re straight up dumping large XML files or giant wads of text into SQL rows then this might be of use to you however if you’re trying to compress data at a page level then you’re out of luck, unless you want to code an extravagant solution (like creating a compression dictionary table in the same database, but that’s borderline psycotic if you ask me).
The solution for me was to move said problem table into its own database and, during the migration, trim out all the fat contained within the data. There were multiple columns I never ended up using, the URL fields were all very similar and the largest column, the one most likely causing me to chew through so much space, was no longer needed now that I was able to query that data properly rather than having to work around Azure Table Storage’s limitations. Page compression would’ve been an easy quick fix but it would’ve only been a matter of time before I found myself in the same situation, struggling to find space where I could get it.
For me this experience aptly demonstrated why its good to work within strict constraints as left unchecked these issues would’ve hit me much harder later on. Sure it can feel like I’m spinning my wheels when hitting issues like this is a monthly occurrence but I’m still in the learning stage of this whole thing and lessons learned now are far better than ones I learn when I finally move this thing into production.
If you’ve ever worked in a multi-tenant environment with shared resources you’ll know of the many pains that can come along with it. Resource sharing always ends up leading to contention and some of the time this will mean that you won’t be able to get access to the resources you want. For cloud services this is par for the course as since you’re always accessing shared services and so any application you build on these kinds of platforms has to take this into consideration lets your application spend an eternity crashing from random connection drop outs. Thankfully Microsoft has provided a few frameworks which will handle these situations for you, especially in the case of Azure SQL.
The Transient Fault Handling Application Block (or Topaz, which is a lot better in my view) gives you access to a number of classes which take out a lot of the pain when dealing with the transient errors you get when using Azure services. Of those the most useful one I’ve found is the RetryPolicy which when instantiated as SqlAzureTransientErrorDetectionStrategy allows you to simply wrap your database transactions with a little bit of code in order to make them resistant to the pitfalls of Microsoft’s cloud SQL service. For the most part it works well as prior to using it I’d get literally hundreds of unhandled exception messages per day. It doesn’t catch everything however so you will still need to handle some connection errors but it does a good job of eliminating the majority of them.
Currently however there’s no native support for it in Entity Framework (Microsoft’s data persistence framework) and this means you have to do a little wrangling in order to get it to work. This StackOverflow question outlines the problem and there’s a couple solutions on there which all work however I went for the simple route of instantiating a RetryPolicy and then just wrapping all my queries with ExecuteAction. As far as I could tell this all works fine and is the supported way of using EF with Topaz at least until 1.6 comes out which will have in built support for connection resiliency.
However when using Topaz in this way it seems that it mucks with entity tracking, causing returned objects to not be tracked in the normal way. I discovered this after I noticed many records not getting updated even though manually working through the data showed that they should be showing different values. As far as I can tell if you wrap an EF query with a RetryPolicy the entity ends up not being tracked and you will need to .Attach() to it prior to making any changes. If you’ve used EF before then you’ll see why this is strange as you usually don’t have to do that unless you’ve deliberately detached the entity or recreated the context. So as far as I can see there must be something in Topaz that causes it to become detached requiring you to reattach it if you want to persist your changes using Context.SaveChanges().
I haven’t tested any of the other methods of using Topaz with EF so it’s entirely possible there’s a way to get the entity tracked properly without having to attach to it after performing the query. Whether they work or not will be an exercise left for the reader as I’m not particularly interested in testing it, at least not just after I got it all working again. By the looks of it though a RC version of EF 6 might not be too far away, so this issue probably won’t remain one for long.
If you’re a developer like me you’ve likely got a set of expectations about the way you handle data. Most likely they all have their roots in the object-oriented/relational paradigm meaning that you’d expect to be able to get some insight into your data by simply running a few queries against it or simply looking at the table, possibly sorting it to find something out. The day you decide to try out something like Azure Table storage however you’ll find that these tools simply aren’t available to you any more due to the nature of the service. It’s at this point where, if you’re like me, you’ll get a little nervous as your data can end up feeling like something of a black box.
A while back I posted about how I was over-thinking the scalability of my Azure application and how I was about to make the move to Azure SQL. That’s been my task for the past 3 weeks or so and what started out as a relatively simple task of simply moving data from one storage mechanism to another has turned into this herculean task that has seen me dive deeper into both Azure Tables and SQL than I have ever done previously. Along the way I’ve found out a few things that, whilst not changing my mind about the migration away from Azure tables, certainly would have made my life a whole bunch easier had I known about them.
1. If you need to query all the records in an Azure table, do it partition by partition.
The not-so-fun thing about Azure Tables is that unless you’re keeping track of your data in your application there’s no real metrics you can dredge up in order to give you some idea of what you’ve actually got. For me this meant that I had one table that I knew the count of (due to some background processing I do using that table) however there are 2 others which I have absolutely 0 idea about how much data is actually contained in there. Estimates using my development database led me to believe there was an order of magnitude more data in there than I thought there was which in turn led me to the conclusion that using .AsTableServiceQuery() to return the whole table was doomed from the start.
However Azure Tables isn’t too bad at returning an entire partition’s worth of data, even if the records number in the 10s or 100s of thousands. Sure the query time goes up linearly depending on how many records you’ve got (as Azure Tables will only return a max of 1000 records at a time) but if they’re all within the same partition you avoid the troublesome table scan which dramatically affects the performance of the query, sometimes to the point of it getting cancelled which isn’t handled by the default RetryPolicy framework. If you need all the data in the entire table you can then do queries on each partition and then dump them all in a list inside your application and then continue to do your query.
2. Optimize your context for querying or updating/inserting records.
Unbeknownst to me the TableServiceContext class has quite a few configuration options available that will allow you to change the way the context behaves. The vast majority of errors I was experiencing came from my background processor which primarily dealt with reading data without making any modifications to the records. If you have applications where this is the case then it’s best to set the Context.MergeOption to MergeOption.NoTracking as this means the context won’t attempt to track the entities.
If you have multiple threads running or queries that return large amounts of records this can lead to a rather large improvement in performance as the context doesn’t have to track any changes to them and the garbage collector can free up these objects even if you use the context for another query. Of course this means that if you do need to make any changes you’ll have to change the context and then attach to the entity in question but you’re probably doing that already. Or at least you should be.
3. Modify your web.config or app.config file to dramatically improve performance and reliability.
For some unknown reason the default number of HTTP connections that a Windows Azure application can make (although I get the feeling this affects all applications making use of the .NET frameworks) is set to 2. Yes just 2. This then manifests itself as all sorts of crazy errors that don’t make a whole bunch of sense like “the underlying connection was closed” when you try to make more than 2 requests at any one time (which includes queries to Azure Tables). The max number of connections you can specify depends on the size of the instance you’re using but Microsoft has a helpful guide on how to set this and other settings in order to make the most out of it.
Additionally some of the guys at Microsoft have collected a bunch of tips for improving the performance of Azure Tables in various circumstances. I’ve cherry picked out the best ones which I’ve confirmed that have worked wonders for me however there’s a fair few more in there that might be of use to you, especially if you’re looking to get every performance edge you can. Many of them are circumstantial and some require you to plan out or storage architecture in advance (so something that can’t be easily retrofitted into an existing app) but since the others have worked I hazard a guess they would to.
I might not be making use of some of these tips now that my application is going to be SQL and TOPAZ but if I can save anyone the trouble I went through trying to sort through all those esoteric errors I can at least say it was worth it. Some of these tips are just good to know regardless of the platform you’re on (like the default HTTP connection limit) and should be incorporated into your application as soon as its feasible. I’ve yet to get all my data into production yet as its still migrating but I get the feeling I might go on another path of discovery with Azure SQL in the not too distant future and I’ll be sure to share my tips for it then.
As always I’m not-so-secretly working on a side project of mine (although I’ve kept it’s true nature a secret from most) which utilizes Windows Azure as the underlying platform. I’ve been working on it for the past 3 months or so and whilst it isn’t my first Azure application it is the first one that I’ve actually put into production. That means I’ve had to deal with all the issues associated with doing that, from building an error reporting framework to making code changes that have no effect in development but fix critical issues when the application is deployed. I’ve also come to the realisation that some the architectural decisions I made, ones done with an eye cast towards future scalability, aren’t as sound as I first thought they were.
I’ve touched on some of the issues and considerations that Azure Tables has previously but what I haven’t dug into is the reasons you would choose to use. On the surface it looks like a stripped down version of a relational database, missing some features but making up for it by being an extremely cheap way of storing a whole lot of data. Figuring that my application was going to be huge some day (as all us developers do) I made the decision to use Azure Tables for everything. Sure querying the data was a little cumbersome but there were ways to code around that, and code around I did. The end solution does work as intended when deployed into production but there are some quirks which don’t sit well with me.
For starters querying data from Azure Tables on anything but the partition key and row key will force a table scan. Those familiar with NOSQL style databases will tell me that that’s the point, storage services like these are optimized for this situation and outside of that you’re better off using an old fashioned SQL database. I realised this when I was developing it however the situations I had in mind fit in well with with the partition/row key paradigm as often I’d need to get a whole partition, single record or (and this is the killer) the entire table itself. Whilst Azure Tables might be great at the first 2 things it’s absolutely rubbish at the latter and this causes me no end of issues.
In the beginning I, like most developers, simply developed something that worked. This included a couple calls along the lines of “get all the records in this table then do something with each of them”. This worked well up until I started getting hundreds of thousands of rows needing to be returned which often ended with the query being killed long before it could complete. Frustrated I implemented a solution that attempted to iterate over all records in the table by requesting all of the records and then following the continuation tokens as they were given to me. This kind of worked although anyone who’s worked with Azure and LINQ will tell you that I reinvented the wheel by forgoing the .AsTableServiceQuery() method which does that all for you. Indeed the end result was essentially the same and the only way around it was to put in some manual retry logic (in addition to the regular RetryPolicy). This works but retrieving/iterating over 800,000 records takes some 5 hours to complete, unacceptable when I can do the same thing on my home PC in a minute or two.
It’s not a limitation of the instances I’m using either as I’m using Azure SQL for one part of it which uses a subset of the data, but still the same number of records, is able to return in a fraction of the time. Indeed the issue seems to come from the fact that Azure Tables lacks the ability to iterate and re-runs the giant query every time I request a the next 1000 records. This often runs into the execution time limit which terminates all connections from my instance to the storage, causing a flurry of errors to occur. The solution seems clear though, I need to move off Azure Tables and onto Azure SQL.
Realistically I should’ve realised this a lot sooner as there are numerous queries I make on things other than the partition and row keys which are critical to the way my application functions. This comes with its own challenges as scaling out the application becomes a lot harder but honestly I’m kidding myself by thinking I’ll need that level of scalability any time soon, especially when I can simply move database tables around on Azure instances to get the required performance and once that’s not enough I’ll finally try to understand SQL Federations properly and that will sort it for good.
Windows Azure Tables are one of those newfangled NoSQL type databases that excels in storing giant swaths of structured data. For what they are they’re quite good as you can store very large amounts of data in there without having to pay through the nose like you would for a traditional SQL server or an Azure instance of SQL. However that advantage comes at a cost: querying the data on anything but the partition key (think of it as a partition of the data within a table) and the row key (the unique identifier within that partition) results in queries that take quite a while to run, especially when compared to its SQL counter parts. There are ways to get around this however no matter how well you structure your data eventually you’ll run up against this limitation and that’s where things start to get interesting.
By default whenever you do a large query against an Azure Table you’ll only get back 1000 records, even if the query will return more. However if your query did have more results than that you’ll be able to access them via a continuation token that you can add to your original query, telling Azure that you want the records past that point. For those of us coding on the native .NET platform we get the lovely benefit of having all of this handled for us directly by simply adding .AsTableServiceQuery() to the end of our LINQ statements (if that’s what you’re using) which will handle the continuation tokens for us. For most applications this is great as it means you don’t have to fiddle around with the rather annoying way of extracting those tokens out of the response headers.
Of course that leads you down the somewhat lazy path of not thinking about the kinds of queries you’re running against your Tables and this can lead to problems down the line. Since Azure is a shared service there are upper limits on how long queries can run and how much data they can return to you. These limits aren’t exactly set in stone and depending on how busy the particular server you’re querying is or the current network utilization at the time your query could either take an incredibly long time to return or could simply end up getting closed off. Anyone who’s developed for Azure in the past will know that this is pretty common, even for the more robust things like Azure SQL, but there’s one thing that I’ve noticed over the past couple weeks that I haven’t seen mentioned anywhere else.
As the above paragraphs might indicate I have a lot of queries that try and grab big chunks of data from Azure Tables and have, of course, coded in RetryPolicies so they’ll keep at it if they should fail. There’s one thing that all the policies in the world won’t protect you from however and that’s connections that are forcibly closed. I’ve had quite a few of these recently and I noticed that they appear to come in waves, rippling through all my threads causing unhandled exceptions and forcing them to restart themselves. I’ve done my best to optimize the queries since then and the errors have mostly subsided but it appears that should one long running query trigger Azure to force the connection closed all connections from that instance to the same Table storage will also be closed.
Depending on how your application is coded this might not be an issue however for mine, where the worker role has about 8 concurrent threads running at any one time all attempting to access the same Table Storage account, it means one long running query that gets terminated triggers a cascade of failures across the rest of threads. For the most part this was avoided by querying directly on row and partition keys however the larger queries had to be broken up using the continuation tokens and then the results concatenated in memory. This introduces another limit on particular queries (as storing large lists in memory isn’t particularly great) which you’ll have to architect your code around. It’s by no means an unsolvable problem however it was one that has forced me to rethink certain parts of my application which will probably need to be on Azure SQL rather than Azure Tables.
Like any cloud platform Azure is a great service which requires you to understand what its various services are good for and what they’re not. I initially set out to use Azure Tables for everything and have since found that it’s simply not appropriate for that, especially if you need to query on parameters that aren’t the row or partition keys. If you have connections being closed on you inexplicably be sure to check for any potentially long running queries on the same role as this post can attest they could very well be the source of what ales you.
Like all industry terms the definitions of what constitutes a cloud service have become somewhat loose as every vendor puts their own particular spin on it. Whilst many cloud products share a baseline of particular features (I.E. high automation, abstraction from underlying hardware, availability as far as your credit card will go) what’s available after that point becomes rather fluid which leads to the PR department making some claims that don’t necessairly line up with reality, or at least what I believe the terms actually mean. For Microsoft’s cloud offering in Azure this became quite clear during the opening keynotes of TechEd 2012 and the subsequent sessions I attended made it clear that the current industry definitions need some work in order to ensure that there’s no confusion around what the capabilities of each of these cloud services actually are.
If this opening paragraph is sound familiar then I’m flattered, you read one of my LifeHacker posts, but there was something I didn’t dive into in that post that I want to explore here.
It’s clear that there’s actually 3 different clouds in Microsoft’s arsenal: the private cloud that’s a combination of System Centre Configuration Manager and Windows Server, the what I’m calling Hosted Private Cloud (referred to as Public by Microsoft) which is basically the same as the previous definition except its running on Microsoft’s hardware and lastly Windows Azure which is the true public cloud. All of these have their own set of pros and cons and I still stand by my statements that the dominant cloud structure in the future will be some kind of hybrid version of all of these but right now the reality is that not a single provider manages to bridge all these gaps, and this is where Microsoft could step in.
The future might be looking more and more cloudy by the day however there’s still a major feature gap between what’s available in Windows Azure when compared to the traditional Microsoft offerings. I can understand that some features might not be entirely feasible at a small scale (indeed many will ask what the point of having something like Azure Table Storage working on a single server would achieve, but hear me out) but Microsoft could make major inroads to Azure adoption by making many of the features installable in Windows Server 2012. They don’t have to come all at once, indeed many of the features in Azure become available in a piecemeal fashion, but there are some key features that I believe could provide tremendous value for the enterprise and ease them into adoption of Microsoft’s public cloud offerings.
SQL Azure Federations for instance could provide database sharding to standalone MSSQL servers giving a much easier route to scaling out SQL than the current clustering solution. Sure there would probably need to be some level of complexity added in for it to function in smaller environments but the principles behind it could easily translate down into the enterprise level. If Microsoft was feeling particularly smart they could even bundle in the option to scale records out onto SQL Azure databases, giving enterprises that coveted cloud burst capability that everyone talks about but no one seems to be able to do.
In fact I believe that pretty much every service provided by Azure, from Table storage all the way down to the CDN interface, could be made available as a feature on Windows Server 2012. They wouldn’t be exact replicas of their cloudified brethren but you could offer API consistency between private and public clouds. This I feel is the ultimate cloud service as it would allow companies to start out with cheap on premise infrastructure (or more likely leverage current investments) and then build out from there. Peaky demands cloud then be easily scaled out to the public cloud and, if the cost is low enough, the whole service could simply transition there.
These features aren’t something that will readily port overnight but if Microsoft truly is serious about bringing cloud capabilities to the masses (and not just hosted virtual machine solutions) then they’ll have to seriously look at providing them. Heck just taking some of the ideals and integrating them into their enterprise products would be a step in the right direction, one that I feel would win them almost universal praise from their consumers.
I’m a stickler for avoiding rework where I can, opting instead to make the most of what I already have before I set out on trying to rework something. You’d think that’d lead me to create overly complicated systems that have multiple nuances and edge cases but since I know I hate reworking stuff I’ll go out of my way to make things right the first time, even if it costs me a bit more initially. For the most part this works well and even when it comes time to dump something and start over again much of my previous work will make it into the reworked product, albeit it in a different form.
I hit such a dilemma last weekend when I was working on my latest project. As long time readers will know I’m a pretty big fan of Microsoft’s Azure services and I decided to use them as the platform for my next endeavour. For the most part it’s been quite good, getting started with the development environment was painless and once I got familiar with the features and limitations of the Azure platform I was able to create the basic application in almost no time at all. Everything was going great until I started to hit some of the fundamental limitations of one of Azure services, namely the Table Storage.
For the uninitiated Azure Table Storage is like a database, but not in the traditional sense. It’s one of them new fan dangled NoSQL type databases, the essential difference being that this kind of database doesn’t have a fixed schema or layout of how the data is stored. Considering that having a fixed layout of how the data is stored is where a database draws many of its advantages from you’d wonder what doing away with it would do for you. What it does is allow for a much higher level of scalability than a traditional database does and thus NoSQL type databases power many large apps, including things like Facebook and Twitter. Figuring that the app might be big one day (and Microsoft’s rather ludicrous pricing for SQL Azure) I settled on using it as my main data store.
However whilst there’s a lot of good things about Azure Table Storage there’s one downside that really hurts it’s usability: it’s limited query engine. You see whilst you can query it with good old fashioned LINQ the query parameters it supports are rather limited. In fact they’re limited to single parameter matches or boolean equivalences which, whilst working for a lot of use cases, doesn’t cater towards user constructed queries quite well. Indeed in my application where someone could search for a single name but the object could contain up to 8 (some of them set, some of them not) meant that I had to construct the query on the fly for the user. No problem I hear you say, LINQKit’s Predicate Builder can build that for you! Well you’d be wrong unfortunately since the resulting LINQ statement confuses the poor Azure Storage Client and the query errors out.
So at this point I was faced with a difficult decision: manually crank out all the queries (which would end up being huge and ridiculously unmaintainable) whilst keeping my Table Storage back end or bite the bullet and move everything into SQL Azure. Whilst I knew that writing out the queries would be a one time only task (a very time consuming one) I couldn’t shake that feeling that doing that would just be the wrong thing to do in the long run, leaving me with an unmaintainable system that I’d curse constantly. I haven’t made the changes yet, that’s this weekend’s goal, but I know it’s not going to be as trouble free as I hope it will.
Sometimes you just have to swallow that bitter pill and it’s usually better to do it sooner rather than later. Azure Table Storage was perfect for me in the beginning but as my requirements evolved the reality of the situation became apparent and I’m stuck in the unfortunate position of having to do rework that I tried so hard to avoid. My project and I will be better for it but it’s always tough when you’ve tried everything you could in order to avoid it and came up empty.