As always I’m not-so-secretly working on a side project of mine (although I’ve kept it’s true nature a secret from most) which utilizes Windows Azure as the underlying platform. I’ve been working on it for the past 3 months or so and whilst it isn’t my first Azure application it is the first one that I’ve actually put into production. That means I’ve had to deal with all the issues associated with doing that, from building an error reporting framework to making code changes that have no effect in development but fix critical issues when the application is deployed. I’ve also come to the realisation that some the architectural decisions I made, ones done with an eye cast towards future scalability, aren’t as sound as I first thought they were.
I’ve touched on some of the issues and considerations that Azure Tables has previously but what I haven’t dug into is the reasons you would choose to use. On the surface it looks like a stripped down version of a relational database, missing some features but making up for it by being an extremely cheap way of storing a whole lot of data. Figuring that my application was going to be huge some day (as all us developers do) I made the decision to use Azure Tables for everything. Sure querying the data was a little cumbersome but there were ways to code around that, and code around I did. The end solution does work as intended when deployed into production but there are some quirks which don’t sit well with me.
For starters querying data from Azure Tables on anything but the partition key and row key will force a table scan. Those familiar with NOSQL style databases will tell me that that’s the point, storage services like these are optimized for this situation and outside of that you’re better off using an old fashioned SQL database. I realised this when I was developing it however the situations I had in mind fit in well with with the partition/row key paradigm as often I’d need to get a whole partition, single record or (and this is the killer) the entire table itself. Whilst Azure Tables might be great at the first 2 things it’s absolutely rubbish at the latter and this causes me no end of issues.
In the beginning I, like most developers, simply developed something that worked. This included a couple calls along the lines of “get all the records in this table then do something with each of them”. This worked well up until I started getting hundreds of thousands of rows needing to be returned which often ended with the query being killed long before it could complete. Frustrated I implemented a solution that attempted to iterate over all records in the table by requesting all of the records and then following the continuation tokens as they were given to me. This kind of worked although anyone who’s worked with Azure and LINQ will tell you that I reinvented the wheel by forgoing the .AsTableServiceQuery() method which does that all for you. Indeed the end result was essentially the same and the only way around it was to put in some manual retry logic (in addition to the regular RetryPolicy). This works but retrieving/iterating over 800,000 records takes some 5 hours to complete, unacceptable when I can do the same thing on my home PC in a minute or two.
It’s not a limitation of the instances I’m using either as I’m using Azure SQL for one part of it which uses a subset of the data, but still the same number of records, is able to return in a fraction of the time. Indeed the issue seems to come from the fact that Azure Tables lacks the ability to iterate and re-runs the giant query every time I request a the next 1000 records. This often runs into the execution time limit which terminates all connections from my instance to the storage, causing a flurry of errors to occur. The solution seems clear though, I need to move off Azure Tables and onto Azure SQL.
Realistically I should’ve realised this a lot sooner as there are numerous queries I make on things other than the partition and row keys which are critical to the way my application functions. This comes with its own challenges as scaling out the application becomes a lot harder but honestly I’m kidding myself by thinking I’ll need that level of scalability any time soon, especially when I can simply move database tables around on Azure instances to get the required performance and once that’s not enough I’ll finally try to understand SQL Federations properly and that will sort it for good.
Maybe I’m just hanging around the wrong places on the Internet but recently there seemed to be a higher than average level of vitriol being launched at Microsoft. From my totally arbitrary standpoint it seems that most people don’t view Microsoft as the evil empire that they used to and instead now focus on the two new giants in the tech center, Apple and Google. This could be easily explained by the fact that Microsoft hasn’t really done anything particularly evil recently whilst Apple and Google have both been dealing with their ongoing controversies of platform lock-down and privacy related matters respectively. Still no less than two articles have crossed my path of late that squarely blame Microsoft for various problems and I feel they warrant a response.
The first comes courtesy of the slowly failing MySpace who has been bleeding users for almost 2 years straight now. Whilst there are numerous reasons as to why they’re failing (with Facebook being the most likely) one blog asked the question if their choice of infrastructure was to blame:
1. Their bet on Microsoft technology doomed them for a variety of reasons.
2. Their bet on Los Angeles accentuated the problems with betting on Microsoft.
Let me explain.
The problem was, as Myspace started losing to Facebook, they knew they needed to make major changes. But they didn’t have the programming talent to really make huge changes and the infrastructure they bet on made it both tougher to change, because it isn’t set up to do the scale of 100 million users it needed to, and tougher to hire really great entrepreneurial programmers who could rebuild the site to do interesting stuff.
I won’t argue point 2 as the short time I spent in Los Angeles showed me that it wasn’t exactly the best place for acquiring technical talent (although I haven’t been to San Francisco to give it a good comparison, but talking with friends who have seems to confirm this). However betting on Microsoft technology is definitely not the reason why MySpace started on a long downward spiral several years ago, as several commenters point out in this article. Indeed MySpace’s lack of innovation appears to stem from the fact that they outsourced much of their core development work to Telligent, a company that provides social network platforms. The issue with such an arrangement meant that they were wholly dependent on Telligent to provide updates to the platform they were using, rather than owning it entirely in house. Indeed as a few other commenters pointed out the switch to the Microsoft stack actually allowed MySpace to Scale much further with less infrastructure than they did previously. If there was a problem with scaling it definitely wasn’t coming from the Microsoft technology stack.
When I first started developing what became Lobaco scalability was always something that was nagging at the back of my head, taunting me that my choice of platform was doomed to failure. Indeed there have been only a few start-ups that have managed to make it big using the Microsoft technology stack so it would seem like the going down this path is a sure fire way to kill any good idea in its infancy. Still I have a heavy investment in the Microsoft line of products so I kept on plugging away with it. Problems of scale appear to be unique for each technology stack with all of them having their pros and cons. Realistically every company with large numbers of users has their own unique way of dealing with it and the technology used seems to be secondary to good architecture and planning.
Still there’s still a strong anti-Microsoft sentiment amongst those in Silicone Valley. Just for kicks I’ve been thumbing through the job listings for various start ups in the area, toying with the idea of moving there to get some real world start-up experience. Most commonly however none of them want to hear anything about a Microsoft based developer, instead preferring something like PHP/Rails/Node.js. Indeed some have gone as far as to say that .NET development is black mark against you, only serving to limit your job prospects:
Programming with .NET is like cooking in a McDonalds kitchen. It is full of amazing tools that automate absolutely everything. Just press the right button and follow the beeping lights, and you can churn out flawless 1.6 oz burgers faster than anybody else on the planet.
However, if you need to make a 1.7 oz burger, you simply can’t. There’s no button for it. The patties are pre-formed in the wrong size. They start out frozen so they can’t be smushed up and reformed, and the thawing machine is so tightly integrated with the cooking machine that there’s no way to intercept it between the two. A McDonalds kitchen makes exactly what’s on the McDonalds menu — and does so in an absolutely foolproof fashion. But it can’t go off the menu, and any attempt to bend the machine to your will just breaks it such that it needs to be sent back to the factory for repairs.
I should probably point out that I don’t disagree with some of the points of his post, most notably how Microsoft makes everything quite easy for you if you’re following a particular pattern. The trouble comes when you try to work outside the box and many programmers will simply not attempt anything that isn’t already solved by Microsoft. Heck I encountered that very problem when I tried to wrangle their Domain Services API to send and receive JSON a supported but wholly undocumented part of their API. I got it working in the end but I could easily see many .NET developers simply saying it couldn’t be done, at least not in the way I was going for it.
Still that doesn’t mean all .NET developers are simple button pushers, totally incapable of thinking outside the Microsoft box. Sure there will be more of those type of programmers simply because .NET is used is so many places (just not Internet start-ups by the looks of it) but to paint all of those who use the technology with the same brush seems pretty far fetched. Heck if he was right then there would’ve been no way for me to get my head around Objective-C since it’s not supported by Visual Studio. Still I managed to get competent in 2 weeks and can now hack my way around in Xcode just fine, despite my extensive .NET heritage.
It’s always the person or company, not the technology, that limits their potential. Sure you may hit a wall with a particular language or infrastructure stack but if you’re people are capable you’ll find a way around it. I might be in the minority when it comes to trying to start a company based around Microsoft technology but the fact is that attempting to relearn another technology stack is a huge opportunity cost. If I do it right however it should be flexible enough so that I can replace parts of the system with more appropriate technologies down the line, if the need calls for it. People pointing the finger at Microsoft for all their woes are simply looking for a scapegoat so they don’t have to address the larger systemic issues or are simply looking for some juicy blog fodder.
I guess they found the latter, since I certainly did 😉
Cloud computing, it’s no secret that I don’t buy wholly into this idea mostly because everyone talking about it either a) doesn’t understand it or b) has forgotten that it’s an old paradigm that was tried a long time ago and failed for many reasons. Still as an IT professional who likes to stay current with emerging trends I’ve been researching the various cloud offerings to make sure I’m up to speed on them should a manager get the bright spark to try and use them in our environment. There’s also the flip side that my chosen specialisation vendor, VMware, has their own cloud product aimed at building your own internal cloud for hosting all your applications. Whilst I still sit on the sceptical fence about the cloud idea as a whole there are some fundamental underpinnings that I think I can make use of in my current endeavours. I might even stop feeling dirty every time I mention the cloud.
As any budding start up engineer will tell you one of the things that always plays on the back of your mind is how you’re going to scale your application out should it get popular. I’ve had a lot of experience with scaling corporate applications and so thought I’d have a decent handle on how to scale my application out. To give you an idea of how most corporate apps scale it’s usually along the lines of adding in additional servers in each tier to handle the data load and then load balancing across them (or in simple terms throwing more hardware at it). This works well when you’ve got buckets of money and your own data centre to play with but us lowly plebs trying to break out into the real world face similar problems of scalability without the capital to back us up.
Right now I host most of my stuff directly off my home connection on a single server box that I cobbled together for $300 some years ago. It’s done me well and is more than grunty enough to handle this web site but anything above that has seen it crumble under the pressure, sometimes spectacularly. When I was looking for hosting solutions for Lobaco I knew that shared hosting wasn’t going to cut it and getting a real server would cost far more than I was willing to pay at this early stage. In the end I found myself getting a Virtual Private Server from SoftSys Hosting for just under $600/year. At the time it was the perfect solution as it let me mirror my current test environment with the additional benefit of being backed up by a huge pipe and enterprise level hardware. It’s been so good that I’m even considering moving this blog up there, if for the only reason that it will mean it won’t go down again just because the net dropped at my place.
However my original ideas of scaling out the application don’t gel too well with the whole VPS idea. You see scaling out in that fashion would see me buying several of these each with the same price tag or higher. Just a simple load balanced web and database server farm would set me back $2400/year neglecting any other costs incurred to get it working. After looking at my various options I begrudgingly started looking at other solutions and that’s when I started to take cloud computing a little more seriously.
Whilst I’ve still only scratched the surface of most offerings the most compelling came in the form of Windows Azure. Apart from the fact that it’s Microsoft and should therefore be blindingly easy to use the fact that they provide free accounts (with discounted rates thereafter) to budding entrepreneurs like myself got me intrigued. A couple Google searches later showed that porting my WCF based services to the Azure platform shouldn’t prove to be too difficult and they provide in built loading balancing to boot. The pricing model is also attractive as it is on a unit basis, you only pay for what you actually use. Azure then could easily provide the required scalability without breaking the bank, leaving me to focus on the more pressing issues than whether or not it will work with a decent chunk of users.
Whilst I won’t be diving into the clouds just yet (the iPhone, she beckons) it’s now on the cards to port Lobaco across so that when it comes time to launch it I won’t have to watch my server like a hawk to make sure it isn’t dying a fiery death under the load. I don’t think the cloud will be the solution to all my scalability issues that might come up in the future but it’s looking more and more like a viable option that will enable me to build a robust service for a good number of users. Then once I’ve got my act together I can start planning out a real solution.
With blackjack, and hookers.