The never-ending quest to satisfy Moore’s Law means that we’re always looking for ways to making computers faster and cheaper. Primarily this focuses on the brain of the computer, the Central Processing Unit (CPU), which in most modern computers is now how to transistors numbering in the billions. All the other components haven’t been resting on their laurels however as shown by the radical improvement in speeds from things like Solid State Drives (SSDs), high-speed interconnects and graphics cards that are just as jam-packed with transistors as any CPU is. One aspect that’s been relatively stagnant however has been RAM which, whilst increasing in speed and density, has only seen iterative improvements since the introduction of the first Double Data Rate (DDR). Today Intel and Micron have announced 3D Xpoint, a new technology that sits somewhere between DRAM and NAND in terms of speed.
Details on the underlying technology are a little scant at the moment however what we do know is that instead of storing information by trapping electrons, like all memory currently does, 3D Xpoint (pronounced cross point) instead stores bits via a change in resistance of the memory material. If you’re like me you’d probably think that this was some kind of phase change memory however Intel has stated that it’s not. What they have told us is that the technology uses a lattice structure which doesn’t require transistors to read and write cells, allowing them to dramatically increase the density, up to 128GB per die. This also comes with the added benefit of being much faster than current NAND technologies that power SSDs although slightly slower than current DRAM, albeit with the added advantage of being non-volatile.
Unlike most new memory technologies which often purport to be the replacements for one type of memory or another Intel and Micron are position 3D Xpoint as an addition to the current architecture. Essentially your computer has several types of memory, all of which are used for a specific purpose. There’s memory directly on the CPU which is incredibly fast but very expensive, so there’s only a small amount. The second type is the RAM which is still fast but can be had in greater amounts. The last is your long term storage, either in the form of spinning rust hard drives or a SSD. 3D Xpoint would sit in between the last two, providing a kind of high speed cache that could hold onto often used data that’s then persisted onto disk. Funnily enough the idea isn’t that novel, things like the XboxOne use a similar architecture, so there’s every chance that it might end up happening.
The reason why this is exciting is because Intel and Micron are already going into production with these new chips, opening up the possibility of a commercial product hitting our shelves in the very near future. Whilst integrating it in the way that they’ve stated in the press release would take much longer, due to the change in architecture, there’s a lot of potential for a new breed of SSD drives to be based on this technology. They might be an order of magnitude more expensive than current SSDs however there are applications where you can’t have too much speed and for those 3D Xpoint could be a welcome addition to their storage stack.
Considering the numerous technological announcements we’ve seen from other large vendors that haven’t amounted to much it’s refreshing to see something that could be hitting the market in short order. Whilst Intel and Micron are still being mum on the details I’m sure that the next few months will see more information make its way to us, hopefully closely followed by demonstrator products. I’m very interested to see what kind of tech is powering the underlying cells as a non-phase change, resistance based memory is something that would be truly novel and, once production hits at-scale levels, could fuel another revolution akin to the one we saw with SSDs all those years ago. Needless to say I’m definitely excited to see where this is heading and I hope Intel and Micron keep us in the loop with the new developments.
Of all the PC upgrades that I’ve ever done in the past the one that’s most notably improved performance of my rig is, by a wide margin, installing a SSD. Whilst good old fashioned spinning rust disks have come a long way in recent years in terms of performance they’re still far and away the slowest component in any modern system. This is what chokes most PC’s performance as the disk is a huge bottleneck, slowing everything down to its pace. The problem can be mitigated somewhat by using several disks in a RAID 0 or RAID 10 set but all of those pale in comparison when compared to even a single SSD.
The problem doesn’t go away for the server environment either, in fact most of the server performance problems I’ve diagnosed have had their roots in poor disk performance. Over the years I’ve discovered quite a few tricks to get around the problems presented by traditional disk drives but there are just some limitations you can’t overcome. Recently at work the issue of disk performance came to a head again as we investigated the possibility of using blade servers in our environment. I casually made mention of a company that I had heard of a while back, Fusion-IO, who specialised in making enterprise class SSDs. The possibility of using one of the Fusion-IO cards as a massive cache for the slower SAN disk was a tantalizing prospect and to my surprise I was able to snag an evaluation unit in order to put it through its paces.
The card we were sent was one of the 640GB ioDrives. It’s surprising heavily for its size, sporting gobs of NAND flash and a massive heat sink that hides the propeitary c ontroller. What intrigued me about the card initially was the NAND didn’t sport any branding I recognised before (usually its recognisable like Samsung) but as it turns out each chip is a 128GB Micron NAND Flash chip. If all that storage was presented raw it would total some 3.1 TB and this is telling of the underlying infrastructure of the Fusion-IO devices.
The total storage available to the operating system once this card is installed is around 640GB (600GB usable). Now to get that kind of storage out of the Micron NAND chips you’d only need 5 of them but the ioDrive comes with a grand total of 25 dotting the board. No traditional RAID scheme can account for the amount of storage presented. So based on the fact that there’s 25 chips and only 5 chips worth of capacity available it follows that the Fusion-IO card uses quintuplet sets of chips to provide the high level of performance that they claim. That’s an incredible amount of parallelism and if I’m honest I expected these chips to all be 256MB chips that were all RAID 1 to make one big drive.
Funnily enough I did actually find some Samsung chips on this card, two 1GB DDR2 chips. These are most likely used for the CPU on the ioDrive which has a front side bus of either 333 or 400MHz based on the RAM speed.
But enough of the techno geekery, what’s really important is how well this thing performs in comparison to traditional disks and whether or not it’s worth the $16,000 price tag that comes along with it. Now I had done some extensive testing of various systems in the past in order to ascertain whether the new Dell servers we were looking at where going to perform as well as their HP counterparts. All of this testing was purely disk based using IOMeter, a disk load simulator that tests and reports on nearly every statistic you want to know about your disk subsystem. If you’re interested in replicating the results I’ve got then I’ve uploaded a copy of my configuration file here. The servers included in the test are Dell M610x, Dell M710HD, Dell M910, Dell R710 and a HP DL380G7. For all the tests (bar the two labelled local install) all of them are a base install of ESXi 5 with a Windows 2008R2 virtual machine installed on top of it. The specs of the virtual machine are 4 vCPUs, 4GB RAM and a 40GB disk.
As you can see the ioDrive really is in a class all of its own. The only server that comes close in terms of IOPS is the M910 and that’s because it’s sporting 2 Samsung SSDs in RAID 0. What impresses me most about the ioDrive though is its random performance which manages to stay quite high even as the block size starts to get bigger. Although its not shown in these tests the one area where the traditional disks actually equal the Fusion-IO is in terms of throughput when you get up to really large write sizes, on the order of 1MB or so. I put this down to the fact that the servers in question, the R710s and DL380G7s, have 8 disks in them that can pump out some serious bandwidth when they need to. If I had 2 Fusion-IO cards though I’m sure I could easily double that performance figure.
What interested me next was to see how close I could get to the spec sheet performance. The numbers I just showed you are particularly incredible but Fusion-IO claims that this particular drive was capable of something on the order of 140,000 IOPS if I played my cards correctly. Using the local install of Windows 2008 I had on there I fired up IOMeter again and set up some 512B tests to see if I could get close to those numbers. The results, as shown in the Dell IO contoller software, are shown below:
Ignoring the small blip in the centre where I had to restart the test you can see that whilst the ioDrive is capable of some pretty incredible IO the advertised maximums are more than likely theoretical than practical. I tried several different tests and while a few averaged higher than this (approximately 80K IOPS was my best) it was still a far cry from the figures they have quoted. Had they gotten within 10~20% I would’ve given it to them but whilst the ioDrive’s performance is incredible it’s not quite as incredible as the marketing department would have you believe.
As a piece of hardware the Fusion-IO ioDrive is really the next step up in terms of performance. The virtual machines I had running directly on the card were considerably faster than their spinning rust counterparts and if you were in need of some really crazy performance you really couldn’t go past one of these cards. For the purpose we had in mind for it however (putting it inside a M610x blade) I can’t really recommend it as it’s a full height blade that only has the power of a half height. The M910 represents much better value with its crazy CPU and RAM count and the SSDs, whilst being far from Fusion-IO level, do a pretty good job of bridging the disk performance gap. I didn’t have enough time to see how it would improve some real world applications (it takes me longer than 10 days to get something like this into our production environment) but based on these figures I have no doubt it improve the performance of whatever I put it into considerably.