The time has long since past when a computer manufacturer could get by on shipping tin. The margins on computer equipment are so low that, most of the time, the equipment they sell is just a loss leader for another part of the business. Nowadays the vast majority of most large computer company’s revenue comes from their services division, usually under the guise of providing the customer a holistic solution rather than just another piece of tin. Thus for many companies the past couple decades have seen them transform from pure hardware businesses into more services focused companies, with several attempting more radical transformations in order to stay relevant. HP has become the most recent company to do this, announcing that they will be splitting the company in half.
HP will now divest itself into 2 different companies. The first will be Hewlett Packard Enterprise comprising of their server market, services branch and software group. The second will be purely consumer focused, comprising of their personal computer business and their printing branch. If you were going to split a PC business this is pretty much how you’d do it as whilst these functions are somewhat complimentary to each other (especially if you want to be the “end to end” supplier for all things computing) there’s just as many times when they’re at odds. HP’s overarching strategy with this split is to have two companies that can be more agile and innovative in their respective markets and, hopefully, see better margins because of it.
When I first heard the rumours swirling about this potential split the first question that popped into my head was “Where is the services business going?”. As I alluded to before the services business is the money maker for pretty much every large PC manufacturer these days and in this case the enterprise part of HP has come away with it. The numbers only give a slight lead to the new enterprise business in terms of revenue and profit however with the hardware business has been on a slow decline for the past few years which, if I’m honest, paints a bleak picture for HP Inc. going forward. There’s nothing to stop them from developing a services capability (indeed parts of the consumer business already have that) however in its current form I’d put my money on HP Inc. being the one who’s worse off out of this deal.
That could change however if HP’s rhetoric has some merit to it. HP, as it stands today, is an amalgamation of dozens of large companies that it acquired over the years and whilst they all had a similar core theme of being in the IT business there really wasn’t a driving overarching goal for them to adhere to. The split gives them an opportunity to define that more clearly for each of the respective companies, allowing them to more clearly define their mission within each of their designated market segments. Whether that will translate into the innovation and agility that they’re seeking is something we’ll have to see as this is yet another unprecedented change from a large IT conglomerate.
As someone who’s been involved in the IT industry for the better part of 2 decades now the amount of change that’s happened in the last couple years has been, honestly, staggering. We’ve seen IBM sell off some of its core manufacturing capability (the one no one got fired for buying), Dell buy back all its stock to become a private company again and now HP, the last of the 3 PC giants, divest itself into 2 companies. It will likely take years before all the effects of these changes are really felt but suffice to say that the PC industry of the future will look radically different to that of the past.
FULL DISCLOSURE: The writer is a current employee of Dell. All opinions expressed in this article are of the writer’s own and are not representative of Dell.
I’ve worked with a lot of different hardware in my life, from the old days of tinkering with my Intel 80286 through to esoteric Linux systems running on DEC tin until I, like everyone else in the industry, settled on x86-64 as the defacto standard. Among the various platforms I was happy to avoid (including such lovely things as Sun SPARC) was Intel’s Itanium range as it’s architecture was so foreign from anything else it was guaranteed that whatever you were trying to do, outside of building software specifically for that platform, was doomed to failure. The only time I ever came close to seeing it being deployed was on the whim of a purchasing manager who needed guaranteed 100% uptime until they realised the size of the cheque they’d need to sign to get it.
If Intel’s original dream was to be believed then this post would be coming to you care of their processors. You see back when it was first developed everything was still stuck in the world of 32bit and the path forward wasn’t looking particularly bright. Itanium was meant to be the answer to this, with Intel’s brand name and global presence behind it we would hopefully see all applications make their migration to the latest and greatest 64bit platform. However the complete lack of any backwards compatibility with any currently developed software and applications meant adopting it was a troublesome exercise and was a death knell for any kind of consumer adoption. Seeing this AMD swooped in with their dually compatible x86-64 architecture which proceeded to spread to all the places that Itanium couldn’t, forcing Intel to adopt the standard in their consumer line of hardware.
Itanium refused to die however finding a home in the niche high end market due to its redundancy features and solid performance for optimized applications. However the number of vendors continuing to support the platform dwindled from their already low numbers with it eventually falling to HP being the only real supplier of Itanium hardware in the form of their NonStop server line. It wasn’t a bad racket for them to keep up though considering the total Itanium market was something on the order of $4 billion a year and with only 55,000 servers shipped per year you can see how much of a premium they attract). Still all the IT workers of the world have long wondered when Itanium would finally bite the dust and it seems that that day is about to come.
HP has just announced that it will be transitioning its NonStop server range from Itanium to x86 effectively putting an end to the only sales channel that Intel had for their platform. What will replace it is still up in the air but it’s safe to assume it will be another Intel chip, likely one from their older Xeon line that shares many of the features that the Itanium had without the incompatible architecture. Current Itanium hardware is likely to stick around for an almost indefinite amount of time however due to the places it has managed to find itself in, much to the dismay of system administrators everywhere.
In terms of accomplishing it’s original vision Itanium was an unabashed failure, never finding the consumer adoption that it so desired and never becoming the herald of 64bit architecture. Commercially though it was somewhat of a success thanks to its features that made it attractive to the high end market but even then it was only a small fraction of total worldwide server sales, barely enough to make it a viable platform for anything but wholly custom solutions. The writing was on the wall when Microsoft said that Windows Server 2008 was the last version to support it and now with HP bowing out the death clock for Itanium has begun ticking in earnest, even if the final death knell won’t come for the better part of a decade.
On the surface this blog hasn’t changed that much. The right hand column had shifted around a bit as I added and subtracted various bits of social integration but for the most part the rest of the site remained largely static. Primarily this was due to laziness on my part as whilst I always wanted to revamp it I could just never find the motivation, nor the right design, to spur me on. However after a long night spent perusing various WordPress theme sites I eventually came across one I liked but it was a paid one and although I’m not one to shy away from paying people for their work it’s always something of a barrier. I kept the page open in Chrome and told myself that when it came time to move servers that’d be the time I’d make the switch.
And yesterday I did.
My previous provider, BurstNET, whilst being quite amazing at the start slowly started to go downhill as of late. Since I’d been having a lot of issues, mostly of my own doing, I had enlisted Pingdom to track my uptime and the number of reports I got started to trend upwards. For the most part it didn’t affect me too much as most of the outages happened outside my prime time however it’s never fun to wake up to an inbox full of alerts so I decided it was time to shift over to a new provider. I had had my eye on Digital Ocean for a while as they provide SSD backed VPSs, something which I had investigated last year but was unable to find at a reasonable price. Plus their plans are extraordinarily cheap for what you get with this site coming to you via their $20/month plan. Set up was a breeze too, even though it seems every provider has their own set of quirks built into their Ubuntu images.
The new theme is BlogTime from ThemeForest and I chose it precisely because it’s the only one I could find that emulates the style you get when you login to WordPress.com (with those big featured images at the top with a nice flat layout). The widgets he provides with the theme unfortunately don’t seem to work, at least not in the way that’s advertised, so I had to spend some time wrestling with the Facebook and Twitter widget APIs to get them looking semi-decent on the sidebar. Thankfully it seems the “dark” theme on both sites seems to match the background on here quite well otherwise I would’ve had to do a whole bunch of custom CSS stuff that I just wasn’t in the mood for last night. Probably the coolest thing about this theme is that it automatically resizes itself depending on what kind of device you have so this blog should look pretty much the same no matter how you’re viewing it.
I also took the opportunity to try and set up caching again and whilst it appeared to work great last night upon attempting to load my site this morning I found that I was greeted with an empty response back. Logging into the WordPress dashboard directly seemed to solve this but I’m not quite sure what caused W3 Total Cache to cause my site to serve nothing for the better part of 5 hours. For the moment I’ve disabled it as the site appears to be running quite fine without it but I’ll probably attempt to get one of them running again in the future as when they’re working they really are quite good.
Does this change in face mean there’s going to be a radical change in what this site is about? I’m not intending to as whilst my traffic has been flagging of late (and why that is I couldn’t tell you) this was more a revamp that was long overdue. I’d changed servers nearly once a year however I had not once changed the theme (well unless you count the Ponies incident) and it was starting to get a little stale, especially considering it seemed to be the theme of choice for a multitude of other tech blogs I visited. So really all that’s changed is the look and the location that this blog is coming to you from, everything else is pretty much the same, for better or for worse.
Of all the PC upgrades that I’ve ever done in the past the one that’s most notably improved performance of my rig is, by a wide margin, installing a SSD. Whilst good old fashioned spinning rust disks have come a long way in recent years in terms of performance they’re still far and away the slowest component in any modern system. This is what chokes most PC’s performance as the disk is a huge bottleneck, slowing everything down to its pace. The problem can be mitigated somewhat by using several disks in a RAID 0 or RAID 10 set but all of those pale in comparison when compared to even a single SSD.
The problem doesn’t go away for the server environment either, in fact most of the server performance problems I’ve diagnosed have had their roots in poor disk performance. Over the years I’ve discovered quite a few tricks to get around the problems presented by traditional disk drives but there are just some limitations you can’t overcome. Recently at work the issue of disk performance came to a head again as we investigated the possibility of using blade servers in our environment. I casually made mention of a company that I had heard of a while back, Fusion-IO, who specialised in making enterprise class SSDs. The possibility of using one of the Fusion-IO cards as a massive cache for the slower SAN disk was a tantalizing prospect and to my surprise I was able to snag an evaluation unit in order to put it through its paces.
The card we were sent was one of the 640GB ioDrives. It’s surprising heavily for its size, sporting gobs of NAND flash and a massive heat sink that hides the propeitary c ontroller. What intrigued me about the card initially was the NAND didn’t sport any branding I recognised before (usually its recognisable like Samsung) but as it turns out each chip is a 128GB Micron NAND Flash chip. If all that storage was presented raw it would total some 3.1 TB and this is telling of the underlying infrastructure of the Fusion-IO devices.
The total storage available to the operating system once this card is installed is around 640GB (600GB usable). Now to get that kind of storage out of the Micron NAND chips you’d only need 5 of them but the ioDrive comes with a grand total of 25 dotting the board. No traditional RAID scheme can account for the amount of storage presented. So based on the fact that there’s 25 chips and only 5 chips worth of capacity available it follows that the Fusion-IO card uses quintuplet sets of chips to provide the high level of performance that they claim. That’s an incredible amount of parallelism and if I’m honest I expected these chips to all be 256MB chips that were all RAID 1 to make one big drive.
Funnily enough I did actually find some Samsung chips on this card, two 1GB DDR2 chips. These are most likely used for the CPU on the ioDrive which has a front side bus of either 333 or 400MHz based on the RAM speed.
But enough of the techno geekery, what’s really important is how well this thing performs in comparison to traditional disks and whether or not it’s worth the $16,000 price tag that comes along with it. Now I had done some extensive testing of various systems in the past in order to ascertain whether the new Dell servers we were looking at where going to perform as well as their HP counterparts. All of this testing was purely disk based using IOMeter, a disk load simulator that tests and reports on nearly every statistic you want to know about your disk subsystem. If you’re interested in replicating the results I’ve got then I’ve uploaded a copy of my configuration file here. The servers included in the test are Dell M610x, Dell M710HD, Dell M910, Dell R710 and a HP DL380G7. For all the tests (bar the two labelled local install) all of them are a base install of ESXi 5 with a Windows 2008R2 virtual machine installed on top of it. The specs of the virtual machine are 4 vCPUs, 4GB RAM and a 40GB disk.
As you can see the ioDrive really is in a class all of its own. The only server that comes close in terms of IOPS is the M910 and that’s because it’s sporting 2 Samsung SSDs in RAID 0. What impresses me most about the ioDrive though is its random performance which manages to stay quite high even as the block size starts to get bigger. Although its not shown in these tests the one area where the traditional disks actually equal the Fusion-IO is in terms of throughput when you get up to really large write sizes, on the order of 1MB or so. I put this down to the fact that the servers in question, the R710s and DL380G7s, have 8 disks in them that can pump out some serious bandwidth when they need to. If I had 2 Fusion-IO cards though I’m sure I could easily double that performance figure.
What interested me next was to see how close I could get to the spec sheet performance. The numbers I just showed you are particularly incredible but Fusion-IO claims that this particular drive was capable of something on the order of 140,000 IOPS if I played my cards correctly. Using the local install of Windows 2008 I had on there I fired up IOMeter again and set up some 512B tests to see if I could get close to those numbers. The results, as shown in the Dell IO contoller software, are shown below:
Ignoring the small blip in the centre where I had to restart the test you can see that whilst the ioDrive is capable of some pretty incredible IO the advertised maximums are more than likely theoretical than practical. I tried several different tests and while a few averaged higher than this (approximately 80K IOPS was my best) it was still a far cry from the figures they have quoted. Had they gotten within 10~20% I would’ve given it to them but whilst the ioDrive’s performance is incredible it’s not quite as incredible as the marketing department would have you believe.
As a piece of hardware the Fusion-IO ioDrive is really the next step up in terms of performance. The virtual machines I had running directly on the card were considerably faster than their spinning rust counterparts and if you were in need of some really crazy performance you really couldn’t go past one of these cards. For the purpose we had in mind for it however (putting it inside a M610x blade) I can’t really recommend it as it’s a full height blade that only has the power of a half height. The M910 represents much better value with its crazy CPU and RAM count and the SSDs, whilst being far from Fusion-IO level, do a pretty good job of bridging the disk performance gap. I didn’t have enough time to see how it would improve some real world applications (it takes me longer than 10 days to get something like this into our production environment) but based on these figures I have no doubt it improve the performance of whatever I put it into considerably.
Back when virtualization was just starting to make headway into the corporate IT market the main aim of the game was consolidation. Vast quantities of CPU, memory and disk resources were being squandered as servers sat idle for the vast majority of their lives, barely ever using the capacity that was assigned to them. Virtualization allowed IT shops the ability to run many low resource servers on the one box, significantly reducing the hardware requirement cost whilst providing a whole host of other features. It followed then that administrators looked towards over-provisioning their hosts, I.E. creating more virtual machines than the host was technically capable of handling.
The reason this works is because of a feature of virtualization platforms called scheduling. In essence when you put a virtual machine on an over-provisioned host it will not be guaranteed to get resources when it needs them, instead it’s scheduled on and in order to keep it and all the other virtual machines running properly. Surprisingly this works quite well as for the most part virtual machines spend a good part of their life idle and the virtualization platform uses this information to schedule busy machines ahead of idle ones. Recently I was approached to find out what the limits were of a new piece of hardware that we had procured and I’ve discovered some rather interesting results.
The piece of kit in question is a Dell M610x blade server with the accompanying chassis and interconnects. The specifications we got were pretty good being a dual processor arrangement (2 x Intel Xeon X5660) with 96GB of memory. What we were trying to find out was what kind of guidelines should we have around how many virtual machines could comfortably run on such hardware before performance started to degrade. There was no such testing done with previous hardware so I was working in the dark on this one, so I’ve devised my own test methodology in order to figure out the upper limits of over-provisioning in a virtual world.
The primary performance bottleneck for any virtual environment is the disk subsystem. You can have the fastest CPUs and oodles of RAM and still get torn down by slow disk. However most virtual hosts will use some form of shared storage so testing that is out of the equation. The two primary resources we’re left with then are CPU and memory and the latter is already a well known problem space. However I wasn’t able to find any good articles on CPU over-provisioning so I devised some simple tests to see how the systems would perform when under a load that was well above its capabilities.
The first test was a simple baseline, since the server has 12 available physical cores (HyperThreading might say you get another core, but that’s a pipe dream) I created 12 virtual machines each with a single core. I then fully loaded the CPUs to max capacity. Shown below is a stacked graph of each virtual machine’s ready time which is a representation of how long the virtual machine was ready¹ to execute some instruction but was not able to get scheduled onto the CPU.
The initial part of this graph shows the machines all at idle. Now you’d think at that stage that their ready times would be zero since there’s no load on the server. However since VMware’s hypervisor knows when a virtual machine is idle it won’t schedule it on as often as the idle loops are simply wasted CPU cycles. The jumpy period after that is when I was starting up a couple virtual machines at a time and as you can see those virtual machine’s ready times drop to 0. The very last part of the graph shows the ready time rocketing down to nothing for all the virtual machines with the top grey part of the graph being the ready time of the hypervisor itself.
This test doesn’t show anything revolutionary as this is pretty much the expected behaviour of a virtualized system. It does however provide us with a solid baseline from which we can draw some conclusions from further tests. The next test I performed was to see what would happen when I doubled the work load on the server, increasing the virtual core count from 12 to a whopping 24.
For comparison’s sake the first graph’s peak is equivalent to the first peak of the second graph. What this shows is that when the CPU is oversubscribed by 100% the CPU wait times rocket through the roof with the virtual machines waiting up to 10 seconds in some cases to get scheduled back onto the CPU. The average was somewhere around half a second which for most applications is an unacceptable amount of time. Just imagine trying to use your desktop and having it freeze for half a second every 20 seconds or so, you’d say it was unusable. Taking this into consideration we now know that there must be some level of happy medium in the centre. The next test then aimed right bang in the middle of these two extremes, putting 18 CPUs on a 12 core host.
Here’s where it gets interesting. The graph depicts the same test running over the entire time but as you can see there are very distinct sections depicting what I call different modes of operation. The lower end of the graph shows a time when the scheduler is hitting bang on its scheduling and the wait times are overall quite low. The second is when the scheduler gives much more priority to the virtual machines that are thrashing their cores and the machines that aren’t doing anything get pushed to the side. However in both instances the 18 cores running are able to get the serviced in a maximum of 20 milliseconds or so, well within the acceptable range of most programs and user experience guidelines.
Taking this all into consideration it’s then reasonable to say that the maximum you can oversubscribe a virtual host in regards to CPU is 1.5 times the number of physical cores. You can extrapolate that further by taking into consideration the average load and if it’s below 100% constantly then you can divide the number of CPUs by that percentage. For example if the average load of these virtual machines was 50% then theoretically you could support 36 single core virtual machines on this particular host. Of course once you get into the very high CPU count things like overhead start to come into consideration, but as a hard and fast rule it works quite well.
If I’m honest I was quite surprised with these results as I thought once I put a single extra thrashing virtual machine on the server it’d fall over in a screaming heap with the additional load. It seems though that VMware’s scheduler is smart enough to be able to service a load much higher than what the server should be capable of without affecting the other virtual machines that adversely. This is especially good news for virtual desktop deployments as typically the limiting factor there was the number of CPU cores available. If you’re an administrator of a virtual deployment I hope you found this informative and it will help you when planning future virtual deployments.
¹CPU ready time was chosen as the metric as it most aptly showcases a server’s ability to serve a virtual machine’s request of the CPU when in a heavy scheduling scenario. Usage wouldn’t be an accurate metric to use since for all these tests the blade was 100% utilized no matter the number of virtual machines running.