Posts Tagged‘storage’

Crossbar-Simple-CMOS-Integration-080213

The Memristor is Almost Ready For Prime Time.

With the amount of NVRAM that’s used these days the amount of innovation in the sector has been comparatively little. For the most part the advances have come from the traditional avenues, die shrinks and new gate technologies, with the biggest advance in 3D construction only happening last week. There’s been musings about other kinds of technology for a long time like memristors which had their first patent granted back in 2007 and were supposed to making their way into our hands late last year, but that never eventuated. However news comes today of a new memory startup that’s promising a lot of things and whilst they don’t say it directly it looks like they might be one of the first to market with memristor based products.

Crossbar-Simple-CMOS-Integration-080213

Crossbar is a new company that’s been working in stealth for some time on a new type of memory product which, surprisingly, isn’t anything particularly revolutionary. It’s called Resistive RAM (RRAM) and a little research shows that there’s been companies working on this idea as far back as 2009. It’s based around a fairly interesting phenomena whereby a dielectric, an electric insulator, can be made to conduct through the application of high voltage. This forms a filament of low resistance which can then be reset, breaking the connection, and then set again using another high voltage jolt. This idea lends itself well to applications in memory as the two states translate perfectly to binary and if the specifications are anything to go by the performance that will come out of them should be quite spectacular.

If this is sounding familiar then you’re probably already familiar with the idea of memristors. These are the 4th fundamental component of electronic circuits that were postulated back in 1971 by Leon Chua and were made real by HP in 2007. In a basic sense their resistance is a function of the current following through them and when the current is removed that resistance is remembered, hence their name. As you can see this describes the function of RRAM pretty well and there is a solid argument to be made that all RRAM technologies are in fact memristors. Thus whilst it’s pretty spectacular that a start up has managed to perfect this technology to the point of producing it on a production fab it’s actually technology that’s been brewing for quite some time and one that everyone in the tech world is excited about.

Crossbar’s secret sauce could likely come from their fabrication process as they claim that the way they create their substrate means that they should be able to stack them, much in the same way that Samsung can now do with their VNAND. Now this is exciting because previously HP alluded to the fact that memristor based storage could be made much more dense than NAND, several orders of magnitude more dense to be precise, and considering the density gains Samsung got with their 3D chips a layered memristor device’s storage capacity could be astronomical. Indeed Crossbar claims this much with up to 1TB for a standard chip that could be stacked multiple times, enabling terabytes on a single chip. That puts good old fashioned spinning rust disks on notice as they just couldn’t compete, even when it comes to archival storage. Of course the end price will be a big factor in this but that kind of storage potential could drive the cost per GB through the floor.

So the next couple months are going to be quite interesting as we have Samsung, the undisputed king of NAND, already in the throws of producing some of the most dense storage available with Crossbar (and multiple other companies) readying memristor technology for the masses. In the short term I give the advantage to Samsung as they’ve got the capital and global reach to get their products out to anyone that wants them. However if memristor based products can do even half of what they’re claimed to be capable of they could quickly start eating Samsung’s lunch and I can’t imagine it’d be too long before they either bought the biggest players in the field or developed the technology themselves. Regardless of how this all plays out the storage market is heading for a shake up, one that can’t come quick enough in my opinion.

 

Azure Websites Stats

The Ups and Downs of a Weekend Developing on Azure.

I heap a lot of praise on Windows Azure here, enough for me to start thinking about how that’s making me sound like a Microsoft shill, but honestly I think it’s well deserved. As someone who’s spent the better part of a decade setting up infrastructure for applications to run on and then began developing said applications in its spare time I really do appreciate not having to maintain another set of infrastructure. Couple that with the fact that I’m a full Microsoft stack kind of guy it’s really hard to beat the tight integration between all of the products in the cloud stack, from the development tools to the back end infrastructure. So like many of my weekends recently I spent the previous coding away on the Azure platform and it was filled with some interesting highs and rather devastating lows.

Azure Websites StatsI’ll start off with the good as it was really the highlight of my development weekend. I had promised to work on a site for a long time friend’s upcoming wedding and whilst I had figured out the majority of it I hadn’t gotten around to cleaning it up for a first shot to show off to him. I spent the majority of my time on the project getting the layout right, wrangling JavaScript/jQuery into behaving properly and spending an inordinate amount of time trying to get the HTML to behave the way I wanted it to. Once I had gotten it into an acceptable state I turned my eyes to deploying it and that’s where Azure Web Sites comes into play.

For the uninitiated Azure Web Sites are essentially a cut down version of the Azure Web Role allowing you to run pretty much full scale web apps for a fraction of the cost. Of course this comes with limitations and unless you’re running on at the Reserved tier you’re essentially sharing a server with a bunch of people (I.E. a common multi-tenant scenario). For this site, which isn’t going to receive a lot of traffic, it’s perfect and I wanted to deploy the first run app onto this platform. Like any good admin I simply dove in head first without reading any documentation on the process and to my surprise I was up and running in a matter of minutes. It was pretty much create web site, download publish profile, click Publish in Visual Studio, import profile and wait for the upload to finish.

Deploying a web site on my own infrastructure would be a lot more complicated as I can’t tell you how many times I’ve had to chase down dependency issues or missing libraries that I have installed on my PC but not on the end server. The publishing profile coupled with the smarts in Visual Studio was able to resolve everything (the deployment console shows the whole process, it was actually quite cool to watch) and have it up and running at my chosen URL in about 10 minutes total. It’s very impressive considering this is still considered preview level technology, although I’m more inclined to classify it as a release candidate.

Other Azure users can probably guess what I’m going to write about next. Yep, the horrific storage problems that Azure had for about 24 hours.

I noticed some issues on Friday afternoon when my current migration (yes that one, it’s still going as I write this) started behaving…weird. The migration is in its last throws and I expected the CPU usage to start ramping down as the multitude of threads finished their work and this lined up with what I was seeing. However I noticed the number of records migrated wasn’t climbing up at the rate it was previously (usually indicative of some error happening that I suppressed in order for the migration to run faster) but the logs showed that it was still going, just at a snail’s pace. Figuring it was just the instance dying I reimaged it and then the errors started flooding in.

Essentially I was disconnected from my NOSQL storage so whilst I could browse my migrated database I couldn’t keep pulling records out. This also had the horrible side effect of not allowing me to deploy anything as it would come back with SSL/TLS connection issues. Googling this led to all sorts of random posts as the error is also shared by the libraries that power the WebClient in .NET so it wasn’t until I stumbled across the ZDNet article that I knew I wasn’t in the wrong. Unfortunately you were really up the proverbial creek without a paddle if your Azure application was based on this as the temporary fixes for this issue, either disabling SSL for storage connections or usurping the certificate handler, left your application rather vulnerable to all sorts of nasty attacks. I’m one of the lucky few who could simply do without until it was fixed but it certainly highlighted the issues that can occur with PAAS architectures.

Honestly though that’s the only issue (that’s not been directly my fault) I’ve had with Azure since I started using it at the end of last year and comparing it to other cloud services it doesn’t fair too badly. It has made me think about what contingency strategy I’ll need to implement should any parts of the Azure infrastructure go away for a extended period of time though. For the moment I don’t think I’ll worry too much as I’m not going to be earning any income from the things I build on it but it will definitely be a consideration as I begin to unleash my products onto the world.

 

3 Tips on Improving Azure Table Storage Performance and Reliability.

If you’re a developer like me you’ve likely got a set of expectations about the way you handle data. Most likely they all have their roots in the object-oriented/relational paradigm meaning that you’d expect to be able to get some insight into your data by simply running a few queries against it or simply looking at the table, possibly sorting it to find something out. The day you decide to try out something like Azure Table storage however you’ll find that these tools simply aren’t available to you any more due to the nature of the service. It’s at this point where, if you’re like me, you’ll get a little nervous as your data can end up feeling like something of a black box.

A while back I posted about how I was over-thinking the scalability of my Azure application and how I was about to make the move to Azure SQL. That’s been my task for the past 3 weeks or so and what started out as a relatively simple task of simply moving data from one storage mechanism to another has turned into this herculean task that has seen me dive deeper into both Azure Tables and SQL than I have ever done previously. Along the way I’ve found out a few things that, whilst not changing my mind about the migration away from Azure tables, certainly would have made my life a whole bunch easier had I known about them.

1. If you need to query all the records in an Azure table, do it partition by partition.

The not-so-fun thing about Azure Tables is that unless you’re keeping track of your data in your application there’s no real metrics you can dredge up in order to give you some idea of what you’ve actually got. For me this meant that I had one table that I knew the count of (due to some background processing I do using that table) however there are 2 others which I have absolutely 0 idea about how much data is actually contained in there. Estimates using my development database led me to believe there was an order of magnitude more data in there than I thought there was which in turn led me to the conclusion that using .AsTableServiceQuery() to return the whole table was doomed from the start.

However Azure Tables isn’t too bad at returning an entire partition’s worth of data, even if the records number in the 10s or 100s of thousands. Sure the query time goes up linearly depending on how many records you’ve got (as Azure Tables will only return a max of 1000 records at a time) but if they’re all within the same partition you avoid the troublesome table scan which dramatically affects the performance of the query, sometimes to the point of it getting cancelled which isn’t handled by the default RetryPolicy framework. If you need all the data in the entire table you can then do queries on each partition and then dump them all in a list inside your application and then continue to do your query.

2. Optimize your context for querying or updating/inserting records.

Unbeknownst to me the TableServiceContext class has quite a few configuration options available that will allow you to change the way the context behaves. The vast majority of errors I was experiencing came from my background processor which primarily dealt with reading data without making any modifications to the records. If you have applications where this is the case then it’s best to set the Context.MergeOption to MergeOption.NoTracking as this means the context won’t attempt to track the entities.

If you have multiple threads running or queries that return large amounts of records this can lead to a rather large improvement in performance as the context doesn’t have to track any changes to them and the garbage collector can free up these objects even if you use the context for another query. Of course this means that if you do need to make any changes you’ll have to change the context and then attach to the entity in question but you’re probably doing that already. Or at least you should be.

3. Modify your web.config or app.config file to dramatically improve performance and reliability.

For some unknown reason the default number of HTTP connections that a Windows Azure application can make (although I get the feeling this affects all applications making use of the .NET frameworks) is set to 2. Yes just 2. This then manifests itself as all sorts of crazy errors that don’t make a whole bunch of sense like “the underlying connection was closed” when you try to make more than 2 requests at any one time (which includes queries to Azure Tables). The max number of connections you can specify depends on the size of the instance you’re using but Microsoft has a helpful guide on how to set this and other settings in order to make the most out of it.

Additionally some of the guys at Microsoft have collected a bunch of tips for improving the performance of Azure Tables in various circumstances. I’ve cherry picked out the best ones which I’ve confirmed that have worked wonders for me however there’s a fair few more in there that might be of use to you, especially if you’re looking to get every performance edge you can. Many of them are circumstantial and some require you to plan out or storage architecture in advance (so something that can’t be easily retrofitted into an existing app) but since the others have worked I hazard a guess they would to.

I might not be making use of some of these tips now that my application is going to be SQL and TOPAZ but if I can save anyone the trouble I went through trying to sort through all those esoteric errors I can at least say it was worth it. Some of these tips are just good to know regardless of the platform you’re on (like the default HTTP connection limit) and should be incorporated into your application as soon as its feasible. I’ve yet to get all my data into production yet as its still migrating but I get the feeling I might go on another path of discovery with Azure SQL in the not too distant future and I’ll be sure to share my tips for it then.

Is It Wrong That I Find The President’s Surveillance Program…Intriguing?

I’m no conspiracy theorist, my feet are way too firmly planted in the world of testable observations to fall for that level of crazy, but I do love it when we the public get to see the inner workings of secretive programs, government or otherwise. Part of it is sheer voyeurism but if I’m truthful the things that really get me are the big technical projects, things that done without the veil of secrecy would be wondrous in their own right. The fact that they’re hidden from public view just adds to the intrigue, making you wonder why such things needed to be kept secret in the first place.

One of the first things that comes to mind was the HEXAGON series of spy satellites which were high resolution observation platforms launched during the cold war that still rival the resolution of satellites launched today. It’s no secret that all space fairing nations have fleets of satellites up there for such purposes but the fact that the USA was able to keep the exact nature of the entire program secret for so long is quite astounding. The technology behind it though was what really intrigued me as it really was years ahead of the curve in terms of capabilities, even if it didn’t have the longevity of its fully digital progeny.

Yesterday however a friend sent me this document from the Electronic Frontier Foundation which provides details on something called the Presidential Surveillance Program (PSP). I was instantly intrigued.

According to William Binney, a former head of the National Security Agency the PSP is in essence a massive data gathering program with possible intercepts at all major fibre terminations within the USA. The system simply siphons off all incoming and outgoing data which is then stored in massive, disparate data repositories. This in itself is a mind boggling endeavour as the amount of data that transits the Internet in a single day dwarfs the capacity of most large data centres. The NSA then ramps it up a notch by being able to recover files, emails and all sorts of other data based on keywords and pattern matching which implies heuristics on a level that’s just simply mind blowing. Of course this is all I’ve got to go on at the moment but the idea itself is quite intriguing.

For starters creating a network that’s able to handle a direct tap to a fibre connection is no small feat in itself. When the fibres terminating at the USA border are capable of speeds in the GB/s range the require infrastructure to handle that is non-trivial, especially so if you want to store that data later. Storing that amount of data is another matter entirely as most commercial arrays begin to tap out in the petabyte range. Binney’s claims start to seem a little far fetched here as he states there are plans up into the yottabyte range but concedes that current incarnations of the program couldn’t have more than tens of exabytes. Barring some major shake up in the way we store data I can’t fathom how they’d manage to create an array that big. Then again I don’t work for the NSA.

As intriguing as such a system might be there’s no question that its existence is a major violation of privacy for US citizens and the wider world. Such a system is akin to tapping every single phone and recording every conversation on it which is most definitely not supported by their current legal system. Just because they don’t use it until the have a reason to doesn’t make it just either as all data gathered without the suspicion of guilt or pretence to commit a crime is illegitimate. I could think of many legitimate uses for the data (anonymous analytical stuff could prove very useful) but the means by which its was gathered eliminates any purpose being legitimate.

The Hybrid Cloud Paradigm Clash.

Maybe it’s my corporate IT roots but I’ve always thought that the best cloud strategy would be a combination of in house resources that would have the ability to offload elsewhere when extra resources were required. Such a deployment would mean that organisations could design their systems around base loads and have the peak handled by public clouds, saving them quite a bit of cash whilst still delivering services at an acceptable level. It would also gel well with management types as not many are completely comfortable being totally reliant on a single provider for any particular service which in light of recent cloud outages is quite prudent. For someone like myself I was more interested in setting up a few Azure instances so I could test my code against the real thing rather than the emulator that comes with Visual Studio as I’ve always found there’s certain gotchas that don’t show up until you’re running on a real instance.

Now the major cloud providers: Rackspace, AWS, et. al. haven’t really expressed much interest in supporting configurations like this which makes business sense for them since doing so would more than likely eat into their sales targets. They could license the technology of course but that brings with it a whole bunch of other problems like what are supported configurations and releasing some measure of control over the platform in order to enable end users to be able to deploy their own nodes. However I had long thought Microsoft, who has a long history of letting users install stuff on their own hardware, would eventually allow Azure to run in some scaled down fashion to facilitate this hybrid cloud idea.

Indeed many developments in their Azure product seemed to support this, the strongest of which being the VM role which allowed you to build your own virtual machine then run it on their cloud. Microsoft have offered their Azure Appliance product for a while as well, allowing large scale companies and providers the opportunity to run Azure on their own premises. Taking this all into consideration you’d think that Microsoft wasn’t too far away from offering a solution for medium organisations and developers that were seeking to go to the Azure platform but also wanted to maintain some form of control over their infrastructure.

After talking with a TechEd bound mate of mine however, it seems that idea is off the table.

VMware has had their hybrid cloud product (vCloud) available for quite some time and whilst it satisfies most of the things I’ve been talking about so far it doesn’t have the sexy cloud features like an in-built scalable NoSQL database or binary object storage. Since Microsoft had their Azure product I had assumed they weren’t interested in competing with VMware on the same level but after seeing one of the TechEd classes and subsequently browsing their cloud site it looks like they’re launching SCVMM 2012 as a direct competitor to vCloud. This means that Microsoft is basically taking the same route by letting you build your own private cloud, which is basically just a large pool of shared resources, foregoing any implementation of the features that make Azure so gosh darn sexy.

Figuring that out left me a little disappointed, but I can understand why they’re doing it.

Azure, as great as I think it is, probably doesn’t make sense in a deployment scenario of anything less than a couple hundred nodes. Much of Azure’s power, like any cloud provider, comes from its large number of distributed nodes which provide redundancy, flexibility and high performance. The Hyper-V based private cloud then is more tailored to the lower end where enterprises likely want more control that what Azure would provide, not to mention that experience in deploying Azure instances is limited to Microsoft employees and precious few from the likes of Dell, Fujitsu and HP. Hyper-V then is the better solution for those looking to deploy a private cloud and should they want to burst out to a public cloud they’ll just have to code their application to be able to do that. Such a feature isn’t impossible however, but it is an additional cost that will need to be considered.

Goodbye, My Sweet Optical Drive.

I’ve been drooling over the specifications of my next computer for well over a month now, tweaking bits here and there to ensure that the PC I end up building will provide the best value for money I can get. Sure there are a few extravagances in it like the Corsair H70 water cooling kit and the Razer Megasoma mouse pad but otherwise it’s a very respectable rig that will serve me well over the course of the next few years. The initial design I had in my head however failed to account for a few of the real world issues that actually building this system would entail, forcing me to make some tough decisions.

Firstly the case I currently use, a Lian Li PC-B20B, has a drive cage that only fits 4 hard drives in it. Sure I’d probably be able to stuff one in the floppy bay but its far from an ideal solution and it just so happens that the perfect place for the water cooling kit would be right smack bang where the hard drive bay currently is. I’m not sure how I stumbled across it but I saw this awesome product from Lian Li the EX-34NB which converts 3 of the front drive bays into 4 internal hard drive bays, complete with a fan. It was the perfect solution to my dilemma allowing me to have the 4 storage drives and the water cooling solution living together in my case in perfect harmony.

Of course then I asked myself the question, where would the SSD go?

The obvious choice would be in the floppy slot since I have 2 of them and neither of them are getting used, but I may have to remove the cage to fit the water cooler in there (it looks to be a tight fit from the measurements). Additionally the motherboard I’m looking at going with, the AsRock P67 Extreme6, comes with a nifty front bay adapter for a couple USB3 ports that doubles as a SSD mounting kit. This means though that I’d have to be giving up one of the longest lived components that I’ve kept for the better part of a decade, my dual layer DVD burner.

I couldn’t tell you exactly when I bought it but I do know I shelled out a good $200+ dollars for my little IDE burner, top of the line for its time. I can tell you one of the primary reasons I bought it however, it came with a black bezel that matched my gigantic black case perfectly. It was the perfect little work horse and whilst its dual layer abilities were only used a couple times when I forayed into the dark world of Xbox360 “backups” it still burnt many a DVD for me without complaint. It had also developed a curious little quirk over the years, opening with such force that it thought someone had pushed it back in after it had opened, causing it to promptly close. Still it functioned well for what I needed and it stayed with me through 2 full computer upgrades.

Thinking back over the past year or so I can only think of a few times that I ever really needed to burn a DVD for something, most of the time being able to cope quite well with my trusty little flash drive or network shares. Indeed many of the games that I bought either had a digital distribution option or were copied to my hard drive before attempting to install them. Whilst I’d be sad to see the one component that’s been constant in my computing life for such a long time to go I really can’t see a need for it anymore, especially when its taking up a potential mounting spot for my future SSD.

That’s not to say I think that optical media and their respective hardware are dead though, far from it. Whilst the cost of flash drives has come down significantly over the past decade they’re still an order of magnitude more expensive to produce than an optical disc. Indeed even in the lucrative server markets nearly all vendors still provide their updates and tools on CDs simply because the cost of doing so on a flash drive is just too high. Sure if you included the cost of the drive in that whole equation that might change matters slightly but like the floppy drive before it we’ve still got a good decade or so before optical media will be phased out of normal use, although it will still hang on for a long time to come.

It was an interesting realization for me to come to since optical media is the first format I witnessed being born, gain mainstream adoption and then begin to fade in obsolescence. Of course I’m still a long way from being rid of optical drives completely, my PC will be one of only 2 PCs in my house to not have an attached optical drive, but it is the signal that things are moving on and the replacement of flash media is ready to take the helm.

I’ll have to find a fitting home for my long time pal, probably in the media PC where he’ll get used every so often.

Microsoft’s Blackmagic: A Double Edged Sword.

I’m a really big fan of Microsoft’s development tools. No other IDE that I’ve used to date can hold a candle to the mighty Visual Studio, especially when you couple it with things like ReSharper and the massive online communities dedicated to overcoming any of the shortcomings that you might encounter along the way. The same communities are also responsible for developing many additional frameworks in order to extend the Microsoft platforms even further, with many of them making their way into official SDKs. There have only been a few times when I’ve found myself treading new ground with Microsoft tools which no one has before, but every time I have I’ve discovered so much more than I initially set out to.

I’ve come to call these encounters “black magic moments”.

You see with the ease of developing with a large range of solutions already laid out for you it becomes quite tempting to slip into the habit of seeking out a completed solution, rather than building one of your own. Indeed there were a few design decisions in my previous applications that were driven by this, mostly because I didn’t want to dive under the hood of those solutions to develop the fix for my particular problem. It’s quite surprising how far you can get into developing something by doing this but eventually the decisions you make will corner you into a place where you have to make a choice between doing some real development or scraping a ton of work. Microsoft’s development ideals seem to encourage the latter (in favor of using one of their tried and true solutions) but stubborn engineers like me hate having to do rework.

This of course means diving beneath the surface of Microsoft’s black boxes and poking around to get an idea of what the hell is going on. My first real attempt at this was back in the early days of the Lobaco code base when I had decided that everything should be done via JSON. Everything was working out quite well until I started trying to POST a JSON object to my webservice, where upon it would throw out all sorts of errors about not being able to de-serialize the object. I spent the better part of 2 days trying to figure that problem out and got precisely no where, eventually posting my frustrations to the Silverlight forums. Whilst I didn’t get the actual answer from there they did eventually lead me down a path that got me there, but the solution is not documented anywhere nor does it seem that anyone else has attempted such a feat before (or after for that matter).

I hit another Microsoft black magic moment when I was working on my latest project that I had decided would be entirely cloud based. After learning my way around the ins and outs of the Windows Azure platform I took it upon myself to migrate the default authentication system built into ASP.NET MVC 3 onto Microsoft’s cloud. Thanks to a couple handy tutorials the process of doing so seemed fairly easy so I set about my task, converting everything into the cloud. However upon attempting to use the thing I just created I was greeted with all sorts of random errors and no amount of massaging the code would set it straight. After the longest time I found that it came down to a nuance of the Azure Tables storage part of Windows Azure, namely the way it structures data.

In essence Azure Tables is one of them new fangled NOSQL type databases and as such it relies on a couple properties in your object class  to uniquely identify a row and provide scalability. These two properties are called PartitionKey and RowKey and whilst you can leave them alone and your app will still work it won’t be able to leverage any of the cloud goodness. So in my implementation I had overridden these variables in order to get the scalability that I wanted but had neglected to include any setters for them. This didn’t seem to be a problem when storing objects in Azure Tables but when querying them it seems that Azure requires the setters to be there, even if they do nothing at all. Adding one in fixed nearly every problem I was encountering and brought me back to another problem I had faced in the past (more on that when I finally fix it!).

Like any mature framework that does a lot of the heavy lifting for you Microsoft’s solutions suffer when you start to tread unknown territory. Realistically though this is should be expected and I’ve found I spend the vast majority of my time on less than 20% of the code that ends up making the final solution. The upshot is of course that once these barriers are down progress accelerates at an extremely rapid pace, as I saw with both the Silverlight and iPhone clients for Lobaco. My cloud authentication services are nearly ready for prime time and since I struggled so much with this I’ll be open sourcing my solution so that others can benefit from the numerous hours I spent on this problem. It will be my first ever attempt at open sourcing something that I created and the prospect both thrills and scares me, but I’m looking forward to giving back a little to the communities that have given me so much.