If you’re a developer like me you’ve likely got a set of expectations about the way you handle data. Most likely they all have their roots in the object-oriented/relational paradigm meaning that you’d expect to be able to get some insight into your data by simply running a few queries against it or simply looking at the table, possibly sorting it to find something out. The day you decide to try out something like Azure Table storage however you’ll find that these tools simply aren’t available to you any more due to the nature of the service. It’s at this point where, if you’re like me, you’ll get a little nervous as your data can end up feeling like something of a black box.
A while back I posted about how I was over-thinking the scalability of my Azure application and how I was about to make the move to Azure SQL. That’s been my task for the past 3 weeks or so and what started out as a relatively simple task of simply moving data from one storage mechanism to another has turned into this herculean task that has seen me dive deeper into both Azure Tables and SQL than I have ever done previously. Along the way I’ve found out a few things that, whilst not changing my mind about the migration away from Azure tables, certainly would have made my life a whole bunch easier had I known about them.
1. If you need to query all the records in an Azure table, do it partition by partition.
The not-so-fun thing about Azure Tables is that unless you’re keeping track of your data in your application there’s no real metrics you can dredge up in order to give you some idea of what you’ve actually got. For me this meant that I had one table that I knew the count of (due to some background processing I do using that table) however there are 2 others which I have absolutely 0 idea about how much data is actually contained in there. Estimates using my development database led me to believe there was an order of magnitude more data in there than I thought there was which in turn led me to the conclusion that using .AsTableServiceQuery() to return the whole table was doomed from the start.
However Azure Tables isn’t too bad at returning an entire partition’s worth of data, even if the records number in the 10s or 100s of thousands. Sure the query time goes up linearly depending on how many records you’ve got (as Azure Tables will only return a max of 1000 records at a time) but if they’re all within the same partition you avoid the troublesome table scan which dramatically affects the performance of the query, sometimes to the point of it getting cancelled which isn’t handled by the default RetryPolicy framework. If you need all the data in the entire table you can then do queries on each partition and then dump them all in a list inside your application and then continue to do your query.
2. Optimize your context for querying or updating/inserting records.
Unbeknownst to me the TableServiceContext class has quite a few configuration options available that will allow you to change the way the context behaves. The vast majority of errors I was experiencing came from my background processor which primarily dealt with reading data without making any modifications to the records. If you have applications where this is the case then it’s best to set the Context.MergeOption to MergeOption.NoTracking as this means the context won’t attempt to track the entities.
If you have multiple threads running or queries that return large amounts of records this can lead to a rather large improvement in performance as the context doesn’t have to track any changes to them and the garbage collector can free up these objects even if you use the context for another query. Of course this means that if you do need to make any changes you’ll have to change the context and then attach to the entity in question but you’re probably doing that already. Or at least you should be.
3. Modify your web.config or app.config file to dramatically improve performance and reliability.
For some unknown reason the default number of HTTP connections that a Windows Azure application can make (although I get the feeling this affects all applications making use of the .NET frameworks) is set to 2. Yes just 2. This then manifests itself as all sorts of crazy errors that don’t make a whole bunch of sense like “the underlying connection was closed” when you try to make more than 2 requests at any one time (which includes queries to Azure Tables). The max number of connections you can specify depends on the size of the instance you’re using but Microsoft has a helpful guide on how to set this and other settings in order to make the most out of it.
Additionally some of the guys at Microsoft have collected a bunch of tips for improving the performance of Azure Tables in various circumstances. I’ve cherry picked out the best ones which I’ve confirmed that have worked wonders for me however there’s a fair few more in there that might be of use to you, especially if you’re looking to get every performance edge you can. Many of them are circumstantial and some require you to plan out or storage architecture in advance (so something that can’t be easily retrofitted into an existing app) but since the others have worked I hazard a guess they would to.
I might not be making use of some of these tips now that my application is going to be SQL and TOPAZ but if I can save anyone the trouble I went through trying to sort through all those esoteric errors I can at least say it was worth it. Some of these tips are just good to know regardless of the platform you’re on (like the default HTTP connection limit) and should be incorporated into your application as soon as its feasible. I’ve yet to get all my data into production yet as its still migrating but I get the feeling I might go on another path of discovery with Azure SQL in the not too distant future and I’ll be sure to share my tips for it then.
3 years. That’s how long I’ve been writing about the R18+ rating in Australia. I had thought that I was pretty much done with it when the rating sailed through the lower house 6 months ago but a week ago the guidelines for the new rating were released by the Australian Classification Board and the gaming community collectively sighed in dismay at what was presented. Taking a look over the guidelines it’s clear that the idea of a unified classification scheme for all forms of media will never come into reality in Australia as apparently games must be treated differently to all other mediums of expression. Their reasoning for this might look sound on the surface (games are interactive and thus more impactful) but their thinking isn’t based on any science I can find and we all know how angry that makes me.
The guidelines themselves are short and concise which makes them rather easy to compare to their previous iterations. Whilst the R18+ rating does open the doors to games that are adult in nature there are some pretty severe restrictions when compared to it’s sister medium of film. Indeed if you look at the guidelines for film’s version of R18+ and then look at the one for games the number of justifications, limits and “in context” qualifiers the comparison is quite stark which shows that the classification board believes that games are more impactful due to their interactive nature. I’ve heard this line before but never actually did some research into whether it was true or not.
Today I found out that it’s not.
Whilst it’s hard to find causative links between video games and any sort of trend in behaviour due to the impossibility of doing proper control testing there is some decent data out there. However meta-analysis of previous studies can show data trends that we can get correlations from. Before you repeat the “correlation is not causation” mantra at me don’t forget that correlation is required for causation¹ so any time you see it pop up the relationship almost always warrants further investigation. In this case whilst the research suggests that violent media may lead to increased aggression that does not directly translate to increased violence and violent media is never the sole factor responsible.
What the research does show however is that the tendency towards aggressive behaviours is no more influenced by interactive games than it is passive consumption of other forms of media. Indeed more research shows that contextual justification of violence is by far more influential than the interactivity or quantity of violence present. Thus the idea that games have to be somehow held up to a different standard than that of other mediums due to its interactivity is at best an emotional argument and not one we should be basing laws around.
Of course since these are a set of guidelines it ultimately comes down to the reviewers to enforce them and there’s a chance that they won’t do so literally. Indeed many games that got slapped with R18+ ratings in other countries previously were waved through under the MA15+ here in Australia and it’s quite possible that with the introduction of the R18+ rating that many of the games that fell under the NC banner previously will get waved through in much the same way. This is pure speculation on my part however and we shall have to wait for the first lot of R18+ games to come through the ACB before we’ll know if there’s any credence to that theory.
It makes me incredibly angry to see policy based around emotional arguments rather than solid research. If I can find the right articles in the couple hours I spend on researching these things then I’d expect nothing less from public servants who are paid to do the same in order to advise their politicians. I can only hope that the government takes the advice of the ALRC seriously and looks towards unifying the classification scheme so we can abandon these silly schemes of differing levels of classification for different types of media. It’s another long shot for sure but after 3 years of shouting to get to this point I’m not about to give up now.
¹And for those smart asses out there who will then tell me that you can have causation without correlation I’ll tell you to go back to your data and have a good hard look at it. If SPSS tells you that there’s no correlation in the data when you somehow know there is then there’s a problem with your data or hypothesis.