Windows Azure Tables are one of those newfangled NoSQL type databases that excels in storing giant swaths of structured data. For what they are they’re quite good as you can store very large amounts of data in there without having to pay through the nose like you would for a traditional SQL server or an Azure instance of SQL. However that advantage comes at a cost: querying the data on anything but the partition key (think of it as a partition of the data within a table) and the row key (the unique identifier within that partition) results in queries that take quite a while to run, especially when compared to its SQL counter parts. There are ways to get around this however no matter how well you structure your data eventually you’ll run up against this limitation and that’s where things start to get interesting.
By default whenever you do a large query against an Azure Table you’ll only get back 1000 records, even if the query will return more. However if your query did have more results than that you’ll be able to access them via a continuation token that you can add to your original query, telling Azure that you want the records past that point. For those of us coding on the native .NET platform we get the lovely benefit of having all of this handled for us directly by simply adding .AsTableServiceQuery() to the end of our LINQ statements (if that’s what you’re using) which will handle the continuation tokens for us. For most applications this is great as it means you don’t have to fiddle around with the rather annoying way of extracting those tokens out of the response headers.
Of course that leads you down the somewhat lazy path of not thinking about the kinds of queries you’re running against your Tables and this can lead to problems down the line. Since Azure is a shared service there are upper limits on how long queries can run and how much data they can return to you. These limits aren’t exactly set in stone and depending on how busy the particular server you’re querying is or the current network utilization at the time your query could either take an incredibly long time to return or could simply end up getting closed off. Anyone who’s developed for Azure in the past will know that this is pretty common, even for the more robust things like Azure SQL, but there’s one thing that I’ve noticed over the past couple weeks that I haven’t seen mentioned anywhere else.
As the above paragraphs might indicate I have a lot of queries that try and grab big chunks of data from Azure Tables and have, of course, coded in RetryPolicies so they’ll keep at it if they should fail. There’s one thing that all the policies in the world won’t protect you from however and that’s connections that are forcibly closed. I’ve had quite a few of these recently and I noticed that they appear to come in waves, rippling through all my threads causing unhandled exceptions and forcing them to restart themselves. I’ve done my best to optimize the queries since then and the errors have mostly subsided but it appears that should one long running query trigger Azure to force the connection closed all connections from that instance to the same Table storage will also be closed.
Depending on how your application is coded this might not be an issue however for mine, where the worker role has about 8 concurrent threads running at any one time all attempting to access the same Table Storage account, it means one long running query that gets terminated triggers a cascade of failures across the rest of threads. For the most part this was avoided by querying directly on row and partition keys however the larger queries had to be broken up using the continuation tokens and then the results concatenated in memory. This introduces another limit on particular queries (as storing large lists in memory isn’t particularly great) which you’ll have to architect your code around. It’s by no means an unsolvable problem however it was one that has forced me to rethink certain parts of my application which will probably need to be on Azure SQL rather than Azure Tables.
Like any cloud platform Azure is a great service which requires you to understand what its various services are good for and what they’re not. I initially set out to use Azure Tables for everything and have since found that it’s simply not appropriate for that, especially if you need to query on parameters that aren’t the row or partition keys. If you have connections being closed on you inexplicably be sure to check for any potentially long running queries on the same role as this post can attest they could very well be the source of what ales you.
Like all industry terms the definitions of what constitutes a cloud service have become somewhat loose as every vendor puts their own particular spin on it. Whilst many cloud products share a baseline of particular features (I.E. high automation, abstraction from underlying hardware, availability as far as your credit card will go) what’s available after that point becomes rather fluid which leads to the PR department making some claims that don’t necessairly line up with reality, or at least what I believe the terms actually mean. For Microsoft’s cloud offering in Azure this became quite clear during the opening keynotes of TechEd 2012 and the subsequent sessions I attended made it clear that the current industry definitions need some work in order to ensure that there’s no confusion around what the capabilities of each of these cloud services actually are.
If this opening paragraph is sound familiar then I’m flattered, you read one of my LifeHacker posts, but there was something I didn’t dive into in that post that I want to explore here.
It’s clear that there’s actually 3 different clouds in Microsoft’s arsenal: the private cloud that’s a combination of System Centre Configuration Manager and Windows Server, the what I’m calling Hosted Private Cloud (referred to as Public by Microsoft) which is basically the same as the previous definition except its running on Microsoft’s hardware and lastly Windows Azure which is the true public cloud. All of these have their own set of pros and cons and I still stand by my statements that the dominant cloud structure in the future will be some kind of hybrid version of all of these but right now the reality is that not a single provider manages to bridge all these gaps, and this is where Microsoft could step in.
The future might be looking more and more cloudy by the day however there’s still a major feature gap between what’s available in Windows Azure when compared to the traditional Microsoft offerings. I can understand that some features might not be entirely feasible at a small scale (indeed many will ask what the point of having something like Azure Table Storage working on a single server would achieve, but hear me out) but Microsoft could make major inroads to Azure adoption by making many of the features installable in Windows Server 2012. They don’t have to come all at once, indeed many of the features in Azure become available in a piecemeal fashion, but there are some key features that I believe could provide tremendous value for the enterprise and ease them into adoption of Microsoft’s public cloud offerings.
SQL Azure Federations for instance could provide database sharding to standalone MSSQL servers giving a much easier route to scaling out SQL than the current clustering solution. Sure there would probably need to be some level of complexity added in for it to function in smaller environments but the principles behind it could easily translate down into the enterprise level. If Microsoft was feeling particularly smart they could even bundle in the option to scale records out onto SQL Azure databases, giving enterprises that coveted cloud burst capability that everyone talks about but no one seems to be able to do.
In fact I believe that pretty much every service provided by Azure, from Table storage all the way down to the CDN interface, could be made available as a feature on Windows Server 2012. They wouldn’t be exact replicas of their cloudified brethren but you could offer API consistency between private and public clouds. This I feel is the ultimate cloud service as it would allow companies to start out with cheap on premise infrastructure (or more likely leverage current investments) and then build out from there. Peaky demands cloud then be easily scaled out to the public cloud and, if the cost is low enough, the whole service could simply transition there.
These features aren’t something that will readily port overnight but if Microsoft truly is serious about bringing cloud capabilities to the masses (and not just hosted virtual machine solutions) then they’ll have to seriously look at providing them. Heck just taking some of the ideals and integrating them into their enterprise products would be a step in the right direction, one that I feel would win them almost universal praise from their consumers.
Today started out pretty much like yesterday. I did my typical thing of staying up just a tad too late thanks to DOTA 2 and my terrible addiction to watching the Discovery Channel if its on the hotel TV (you should’ve seen the gold dredging showdown I watched, it was incredible television) meant I wasn’t at 100% when I got up but the smorgasbord of breakfast stuffs and coffee are a powerful motivator. Also it seems the combination of some good old fashioned delayed onset muscle soreness coupled with what I think is a mild cold has left me in less than stellar shape. Still I made it to all the sessions I planned to today and some of them really impressed me, not least of which was PowerShell V3.0
I won’t go into terrible detail about it here as my post tomorrow on LifeHacker will give a better rundown of the features but suffice to say I’m excited to use it. It might be a long time before I get to see any of it in production (my current project is only just getting onto Windows 7) but I’ll probably be playing around with it at home as there’s an awful lot of good stuff in there that I could make use of. I’m probably going to have to sweet talk my way into a TechNet/MSDN subscription though as I don’t have access to one at the moment (nudge nudge wink wink Microsoft).
I was also very impressed by the number of value add services available from Microsoft for any kind of application. Long time readers will know of the pains I had back when I thought that I was only 2 steps away from being the next Internet success story and it seems I’m not alone if Microsoft has put this much effort into giving us plebs some amazing things for free. I’ve actually got an application in the pipeline that I’ve been working on casually for the past couple weeks and I think it’s going to be a good candidate to try some of these services out and hopefully actually launch it instead of procrastinating endlessly.
There was one particular session I was rather disappointed in (Building Cross Device Mobile Applications Powered By SQL Azure Federations if you were wondering) as the name lead me to believe there’d be a heavy focus on the challenges of cross platform development. It wasn’t unfortunately as the majority of the session was dedicated to the back end infrastructure with the cross platform part of it amounting to little more than “We used MonoTouch”. That’s cool and all but it’s nothing I didn’t learn a year ago after an hour or so of Googling the different options. I can understand that they can’t really spend the majority of their time here spruiking another company’s product but that doesn’t stop me from feeling somewhat disappointed.
Tomorrow’s my last day here and thankfully it’ll be a relatively tame affair as my current condition coupled with the potential shenanigans that I might get up to at the Hype party that’s currently raging near me could leave me as an incoherent mess. I’ll power on though because I’m crazy like that and it’d be a right shame to let an opportunity like this go to waste because I wasn’t feeling perfect on the day.
I’ve long been of the mind that whilst we’re seeing a lot of new businesses being able to fully cloudify their operations, mostly because they have the luxury of designing their processes around these cloud services, established organisations will more than likely never achieve full cloud integration. Whether this is because of data sovereignty issues, lack of trust in the services themselves or simply fear of changing over doesn’t really matter as it’s up to the cloud providers to offer solutions that will ease their customer’s transition onto the cloud platform. From my perspective it seems clear that the best way to approach this is by offering hybrid cloud solutions, ones that can leverage their current investment in infrastructure whilst giving them the flexibility of cloud services. Up until recently there weren’t many companies looking at this approach but that has changed significantly in the past few months.
However there’s been one major player in the cloud game that’s been strangely absent in the hybrid cloud space. I am, of course, referring to Microsoft as whilst they have extensive public cloud offerings in the form of their hosted services as well as Azure they haven’t really been able to offer anything past their usual Hyper-V plus System Centre suite of products. Curiously though Microsoft, and many others it seems, have been running with the definition of a private cloud being just that: highly virtualized environment with dynamic resourcing. I’ll be honest I don’t share that definition at all as realistically that’s just Infrastructure as a Service, a critical part of any cloud service but not a cloud service in its own.
They are however attempting to make inroads to the private cloud area with their latest announcement called the Service Management Portal. When I first read about this it was touted as Microsoft opening the doors to service providers to host their own little Azure cloud but its in fact nothing like that at all. Indeed it just seems to be an extension of their current Software as a Service offerings which is really nothing that couldn’t be achieved before with the current tools available. System Centre Configuration Manager 2012 appears to make this process a heck of a lot easier mind you but with it only being 3 months after its RTM release I can’t say that it’d be in production use at scale anywhere bar Microsoft at this current point in time.
It’s quite possible that they’re trying a different approach to this idea after their ill-failed attempt at trying to get Azure clouds up elsewhere via the Azure Appliance initiative. The problem with that solution was the scale required as the only provider I know of that actually offers the Azure services is Fujitsu and try as you might you won’t be able to sign up for that service without engaging directly with them. That’s incredibly counter-intuitive to the way the cloud should work and so it isn’t surprising that Microsoft has struggled to make any sort of in roads using that strategy.
Microsoft really has a big opportunity here to use their captive market of organisations that are heavily invested in their product as leverage in a private/hybrid cloud strategy. First they’d need to make the Azure platform available as a Server Role on Windows Server 2012. This would then allow the servers to become part of the private computing cloud which could have applications deployed on them. Microsoft could then make their core applications (Exchange, SharePoint, etc.) available as Azure applications, nullifying the need for administrators to do rigorous architecture work in order to deploy the applications. The private cloud can then be leveraged by the developers in order to build the required applications which could, if required, burst out into the public cloud for additional resources. If Microsoft is serious about bringing the cloud to their large customers they’ll have to outgrow the silly notion that SCCM + Hyper-V merits the cloud tag as realistically it’s anything but.
I understand that no one is really doing this sort of thing currently (HP’s cloud gets close, but I’ve yet to hear about anyone who wasn’t a pilot customer seriously look at it) but Microsoft is the kind of company that has the right combination of established infrastructure in organisations, cloud services and technically savy consumer base to make such a solution viable. Until they offer some deployable form of Azure to their end users any product they offer as a private cloud solution will be that only in name. Making Azure deployable though could be a huge boon to their business and could very well form a sort of reformation of the way they do computing.
I’m a stickler for avoiding rework where I can, opting instead to make the most of what I already have before I set out on trying to rework something. You’d think that’d lead me to create overly complicated systems that have multiple nuances and edge cases but since I know I hate reworking stuff I’ll go out of my way to make things right the first time, even if it costs me a bit more initially. For the most part this works well and even when it comes time to dump something and start over again much of my previous work will make it into the reworked product, albeit it in a different form.
I hit such a dilemma last weekend when I was working on my latest project. As long time readers will know I’m a pretty big fan of Microsoft’s Azure services and I decided to use them as the platform for my next endeavour. For the most part it’s been quite good, getting started with the development environment was painless and once I got familiar with the features and limitations of the Azure platform I was able to create the basic application in almost no time at all. Everything was going great until I started to hit some of the fundamental limitations of one of Azure services, namely the Table Storage.
For the uninitiated Azure Table Storage is like a database, but not in the traditional sense. It’s one of them new fan dangled NoSQL type databases, the essential difference being that this kind of database doesn’t have a fixed schema or layout of how the data is stored. Considering that having a fixed layout of how the data is stored is where a database draws many of its advantages from you’d wonder what doing away with it would do for you. What it does is allow for a much higher level of scalability than a traditional database does and thus NoSQL type databases power many large apps, including things like Facebook and Twitter. Figuring that the app might be big one day (and Microsoft’s rather ludicrous pricing for SQL Azure) I settled on using it as my main data store.
However whilst there’s a lot of good things about Azure Table Storage there’s one downside that really hurts it’s usability: it’s limited query engine. You see whilst you can query it with good old fashioned LINQ the query parameters it supports are rather limited. In fact they’re limited to single parameter matches or boolean equivalences which, whilst working for a lot of use cases, doesn’t cater towards user constructed queries quite well. Indeed in my application where someone could search for a single name but the object could contain up to 8 (some of them set, some of them not) meant that I had to construct the query on the fly for the user. No problem I hear you say, LINQKit’s Predicate Builder can build that for you! Well you’d be wrong unfortunately since the resulting LINQ statement confuses the poor Azure Storage Client and the query errors out.
So at this point I was faced with a difficult decision: manually crank out all the queries (which would end up being huge and ridiculously unmaintainable) whilst keeping my Table Storage back end or bite the bullet and move everything into SQL Azure. Whilst I knew that writing out the queries would be a one time only task (a very time consuming one) I couldn’t shake that feeling that doing that would just be the wrong thing to do in the long run, leaving me with an unmaintainable system that I’d curse constantly. I haven’t made the changes yet, that’s this weekend’s goal, but I know it’s not going to be as trouble free as I hope it will.
Sometimes you just have to swallow that bitter pill and it’s usually better to do it sooner rather than later. Azure Table Storage was perfect for me in the beginning but as my requirements evolved the reality of the situation became apparent and I’m stuck in the unfortunate position of having to do rework that I tried so hard to avoid. My project and I will be better for it but it’s always tough when you’ve tried everything you could in order to avoid it and came up empty.
Maybe it’s my corporate IT roots but I’ve always thought that the best cloud strategy would be a combination of in house resources that would have the ability to offload elsewhere when extra resources were required. Such a deployment would mean that organisations could design their systems around base loads and have the peak handled by public clouds, saving them quite a bit of cash whilst still delivering services at an acceptable level. It would also gel well with management types as not many are completely comfortable being totally reliant on a single provider for any particular service which in light of recent cloud outages is quite prudent. For someone like myself I was more interested in setting up a few Azure instances so I could test my code against the real thing rather than the emulator that comes with Visual Studio as I’ve always found there’s certain gotchas that don’t show up until you’re running on a real instance.
Now the major cloud providers: Rackspace, AWS, et. al. haven’t really expressed much interest in supporting configurations like this which makes business sense for them since doing so would more than likely eat into their sales targets. They could license the technology of course but that brings with it a whole bunch of other problems like what are supported configurations and releasing some measure of control over the platform in order to enable end users to be able to deploy their own nodes. However I had long thought Microsoft, who has a long history of letting users install stuff on their own hardware, would eventually allow Azure to run in some scaled down fashion to facilitate this hybrid cloud idea.
Indeed many developments in their Azure product seemed to support this, the strongest of which being the VM role which allowed you to build your own virtual machine then run it on their cloud. Microsoft have offered their Azure Appliance product for a while as well, allowing large scale companies and providers the opportunity to run Azure on their own premises. Taking this all into consideration you’d think that Microsoft wasn’t too far away from offering a solution for medium organisations and developers that were seeking to go to the Azure platform but also wanted to maintain some form of control over their infrastructure.
After talking with a TechEd bound mate of mine however, it seems that idea is off the table.
VMware has had their hybrid cloud product (vCloud) available for quite some time and whilst it satisfies most of the things I’ve been talking about so far it doesn’t have the sexy cloud features like an in-built scalable NoSQL database or binary object storage. Since Microsoft had their Azure product I had assumed they weren’t interested in competing with VMware on the same level but after seeing one of the TechEd classes and subsequently browsing their cloud site it looks like they’re launching SCVMM 2012 as a direct competitor to vCloud. This means that Microsoft is basically taking the same route by letting you build your own private cloud, which is basically just a large pool of shared resources, foregoing any implementation of the features that make Azure so gosh darn sexy.
Figuring that out left me a little disappointed, but I can understand why they’re doing it.
Azure, as great as I think it is, probably doesn’t make sense in a deployment scenario of anything less than a couple hundred nodes. Much of Azure’s power, like any cloud provider, comes from its large number of distributed nodes which provide redundancy, flexibility and high performance. The Hyper-V based private cloud then is more tailored to the lower end where enterprises likely want more control that what Azure would provide, not to mention that experience in deploying Azure instances is limited to Microsoft employees and precious few from the likes of Dell, Fujitsu and HP. Hyper-V then is the better solution for those looking to deploy a private cloud and should they want to burst out to a public cloud they’ll just have to code their application to be able to do that. Such a feature isn’t impossible however, but it is an additional cost that will need to be considered.
Anyone who works in IT or a slightly related field will tell you that you’ve got to be constantly up to date with the latest technology lest you find yourself quickly obsoleted. Depending on what your technology platform of choice is the time frame you have to work in can vary pretty wildly, but you’d be doing yourself (and your career) a favour by skilling up in either a new or different technology every 2 years or so. Due to the nature of my contracts though I’ve found myself learning completely new technologies at least every year and its only in this past contract that I’ve come back full circle to the technology I initially made my career on, but that doesn’t mean the others I learnt in the interim haven’t helped immensely.
If I was honest though I couldn’t say that in the past I that I actively sought out new technologies to become familiar with. Usually I would start a new job based on the skills that I had from a previous engagement only to find that they really required something different. Being the adaptable sort I’d go ahead and skill myself up in that area, quickly becoming proficient enough to do the work they required. Since most of the places I worked in were smaller shops this worked quite well since you’re always required to be a generalist in these situations. It’s only been recently that I’ve turned my eyes towards the future to figure out where I should place my next career bet.
It was a conversation that came up between me and a colleague of mine whilst I was on a business trip with them overseas. He asked me where I thought were some of the IT trends that were going to take off in the coming years and I told him that I thought cloud based technologies were the way to go. At first he didn’t believe me, which was understandable since we work for a government agency and they don’t typically put any of their data in infrastructure they don’t own. I did manage to bring him around to the idea eventually though, thanks in part to my half decade of constant reskilling.
Way back when I was just starting out as a system administrator I was fortunate enough to start out working with VMware’s technology stack, albeit in a strange incarnation of running their workstation product on a server. At the time I didn’t think it was anything revolutionary but as time went on I saw how much money was going to waste as many servers sat idle for the majority of their lives, burning power and providing little in return. Virtualization then was a fundamental change to the way that back end infrastructure would be designed, built and maintained and I haven’t encountered any mid to large sized organisation who isn’t using it in some form.
Cloud technologies then represent the evolution of this idea. I reference cloud technologies and not “the cloud” deliberately as whilst the idea of relying on external providers to do all the heavy lifting for you is extremely attractive it unfortunately doesn’t work for everyone, especially for those who simply cannot outsource. Cloud technologies and principles however, like the idea of having massive pools of compute and storage resources that can be carved up dynamically, have the potential to change the way back end services are designed and provisioned. Most importantly it would decouple the solution design from the underlying infrastructure meaning that neither would dictate the other. That in itself is enough for most IT shops want to jump on the cloud bandwagon, and some are even doing so already.
It’s for that exact reason why I started developing on the Windows Azure platform and researching into VMware’s vCloud solution. Whilst the consumer space is very much in love with the cloud and the benefits it provides large scale IT is a much slower moving beast and it’s only just now coming around to the cloud idea. With the next version of Windows shaping up to be far more cloud focused than any of its predecessors it seems quite prudent for us IT administrators to start becoming familiar with the benefits cloud technology provides, lest we be left behind by those up and comers who are betting on this burgeoning platform.