If you’re a developer like me you’ve likely got a set of expectations about the way you handle data. Most likely they all have their roots in the object-oriented/relational paradigm meaning that you’d expect to be able to get some insight into your data by simply running a few queries against it or simply looking at the table, possibly sorting it to find something out. The day you decide to try out something like Azure Table storage however you’ll find that these tools simply aren’t available to you any more due to the nature of the service. It’s at this point where, if you’re like me, you’ll get a little nervous as your data can end up feeling like something of a black box.
A while back I posted about how I was over-thinking the scalability of my Azure application and how I was about to make the move to Azure SQL. That’s been my task for the past 3 weeks or so and what started out as a relatively simple task of simply moving data from one storage mechanism to another has turned into this herculean task that has seen me dive deeper into both Azure Tables and SQL than I have ever done previously. Along the way I’ve found out a few things that, whilst not changing my mind about the migration away from Azure tables, certainly would have made my life a whole bunch easier had I known about them.
1. If you need to query all the records in an Azure table, do it partition by partition.
The not-so-fun thing about Azure Tables is that unless you’re keeping track of your data in your application there’s no real metrics you can dredge up in order to give you some idea of what you’ve actually got. For me this meant that I had one table that I knew the count of (due to some background processing I do using that table) however there are 2 others which I have absolutely 0 idea about how much data is actually contained in there. Estimates using my development database led me to believe there was an order of magnitude more data in there than I thought there was which in turn led me to the conclusion that using .AsTableServiceQuery() to return the whole table was doomed from the start.
However Azure Tables isn’t too bad at returning an entire partition’s worth of data, even if the records number in the 10s or 100s of thousands. Sure the query time goes up linearly depending on how many records you’ve got (as Azure Tables will only return a max of 1000 records at a time) but if they’re all within the same partition you avoid the troublesome table scan which dramatically affects the performance of the query, sometimes to the point of it getting cancelled which isn’t handled by the default RetryPolicy framework. If you need all the data in the entire table you can then do queries on each partition and then dump them all in a list inside your application and then continue to do your query.
2. Optimize your context for querying or updating/inserting records.
Unbeknownst to me the TableServiceContext class has quite a few configuration options available that will allow you to change the way the context behaves. The vast majority of errors I was experiencing came from my background processor which primarily dealt with reading data without making any modifications to the records. If you have applications where this is the case then it’s best to set the Context.MergeOption to MergeOption.NoTracking as this means the context won’t attempt to track the entities.
If you have multiple threads running or queries that return large amounts of records this can lead to a rather large improvement in performance as the context doesn’t have to track any changes to them and the garbage collector can free up these objects even if you use the context for another query. Of course this means that if you do need to make any changes you’ll have to change the context and then attach to the entity in question but you’re probably doing that already. Or at least you should be.
3. Modify your web.config or app.config file to dramatically improve performance and reliability.
For some unknown reason the default number of HTTP connections that a Windows Azure application can make (although I get the feeling this affects all applications making use of the .NET frameworks) is set to 2. Yes just 2. This then manifests itself as all sorts of crazy errors that don’t make a whole bunch of sense like “the underlying connection was closed” when you try to make more than 2 requests at any one time (which includes queries to Azure Tables). The max number of connections you can specify depends on the size of the instance you’re using but Microsoft has a helpful guide on how to set this and other settings in order to make the most out of it.
Additionally some of the guys at Microsoft have collected a bunch of tips for improving the performance of Azure Tables in various circumstances. I’ve cherry picked out the best ones which I’ve confirmed that have worked wonders for me however there’s a fair few more in there that might be of use to you, especially if you’re looking to get every performance edge you can. Many of them are circumstantial and some require you to plan out or storage architecture in advance (so something that can’t be easily retrofitted into an existing app) but since the others have worked I hazard a guess they would to.
I might not be making use of some of these tips now that my application is going to be SQL and TOPAZ but if I can save anyone the trouble I went through trying to sort through all those esoteric errors I can at least say it was worth it. Some of these tips are just good to know regardless of the platform you’re on (like the default HTTP connection limit) and should be incorporated into your application as soon as its feasible. I’ve yet to get all my data into production yet as its still migrating but I get the feeling I might go on another path of discovery with Azure SQL in the not too distant future and I’ll be sure to share my tips for it then.
I’ve been a keen user of social tools for a while now, over 4 years if memory serves me, and if I’m honest I’d have to say that whilst they’ve been extremely useful for my personal life they’ve really done nothing for me professionally. Sure Facebook and Twitter helped get this blog out of the doldrums of it seeing an average 1 page view a day (rocketing it to a whopping 10 per day, woo!) but apart from a single piece of software to review I haven’t really furthered my career or future prospects for wealth through using these channels. I could put that down to a major lack of trying however since my career has done pretty well without me having to rely on my social network.
I guess I’m just lucky that I’m in an industry that’s mostly meritocratic.
However recently I’ve started to get noticed by people who’ve found me through my social networking exploits, mostly through LinkedIn. Now the profile I have up there is pretty rudimentary with the only updates I’ve done to it over the past few years being to update my current job location and put a profile picture on there. Still the past 2 months has had me receive multiple phone calls, connection requests and emails all originating from LinkedIn. All of them are recruiters either eager to put me in a position they have or to build their social networks so they have a bigger candidate database, neither of which I’m particularly interested in at this current time.
You see whilst my profile might be public for everyone to see I’m not one of those people who makes connections on there for connections sake. It’s like any other social network to me, if I friend you on Facebook I consider you a friend, if I follow you on Twitter it means I’m interested in what you have to say. A connection on LinkedIn means I’ve worked with you in some capacity in the past or I see potential value in maintaining a business style relationship with you. An unsolicited request from a recruiter matches none of these rules and only serves to dilute the network of people that I’ve curated and only creates value for the recruiter. Sure its flattering that they consider me a valuable enough person on face value to want to connect with me but they’ve also done that with hundreds of other people so it means a lot less than they think it does.
For the most part though the requests are pretty harmless. I’ll get a single email asking to join my network and simply ignore it since I have no idea who they are and since I’m not currently in the market for a new job have no interest in establishing a relationship with them. However there was one persistent bugger who not only sent me multiple connection requests but also decided to email me several times and drudged up my phone number from an old resume he’d pilfered from a previous employer. I thought he would’ve got the hint after me not responding to him for 2 weeks but I guess I underestimated just how desperate some of these people can get.
You know how most of the recruiters I talk to got past the initial barrier? They offered to come see me in person and have a chat about what my needs might be. If you’re not willing to get past the barrier of doing a simple half hour meeting with me then I’m not going to be interested in giving you the recruiting bonuses and recurring commissions that one of my contracts will get you. Sure it’s a small thing but it shows me that you’re not just interested in fleshing out your candidate database and, more importantly, it gives me a chance to see if you’ll provide more value than just pimping me out to job agencies. Market knowledge is as important to me as is your ability to find jobs when I need them.
Could this all be solved by simply taking my LinkedIn profile Down? Sure, but since I’m a massive control freak I’d like the ability to have control over the presence I have on the web and with many people now googling potential employees that presence counts for a lot. I may have to deal with the odd obnoxious recruiter and may never realize any real value from it but I feel it’s still far better to have it than not. Well at least until this blog hits the number 1 spot in google for David Klemke, which it can’t be far off doing now.