Back when virtualization was just starting to make headway into the corporate IT market the main aim of the game was consolidation. Vast quantities of CPU, memory and disk resources were being squandered as servers sat idle for the vast majority of their lives, barely ever using the capacity that was assigned to them. Virtualization allowed IT shops the ability to run many low resource servers on the one box, significantly reducing the hardware requirement cost whilst providing a whole host of other features. It followed then that administrators looked towards over-provisioning their hosts, I.E. creating more virtual machines than the host was technically capable of handling.
The reason this works is because of a feature of virtualization platforms called scheduling. In essence when you put a virtual machine on an over-provisioned host it will not be guaranteed to get resources when it needs them, instead it’s scheduled on and in order to keep it and all the other virtual machines running properly. Surprisingly this works quite well as for the most part virtual machines spend a good part of their life idle and the virtualization platform uses this information to schedule busy machines ahead of idle ones. Recently I was approached to find out what the limits were of a new piece of hardware that we had procured and I’ve discovered some rather interesting results.
The piece of kit in question is a Dell M610x blade server with the accompanying chassis and interconnects. The specifications we got were pretty good being a dual processor arrangement (2 x Intel Xeon X5660) with 96GB of memory. What we were trying to find out was what kind of guidelines should we have around how many virtual machines could comfortably run on such hardware before performance started to degrade. There was no such testing done with previous hardware so I was working in the dark on this one, so I’ve devised my own test methodology in order to figure out the upper limits of over-provisioning in a virtual world.
The primary performance bottleneck for any virtual environment is the disk subsystem. You can have the fastest CPUs and oodles of RAM and still get torn down by slow disk. However most virtual hosts will use some form of shared storage so testing that is out of the equation. The two primary resources we’re left with then are CPU and memory and the latter is already a well known problem space. However I wasn’t able to find any good articles on CPU over-provisioning so I devised some simple tests to see how the systems would perform when under a load that was well above its capabilities.
The first test was a simple baseline, since the server has 12 available physical cores (HyperThreading might say you get another core, but that’s a pipe dream) I created 12 virtual machines each with a single core. I then fully loaded the CPUs to max capacity. Shown below is a stacked graph of each virtual machine’s ready time which is a representation of how long the virtual machine was ready¹ to execute some instruction but was not able to get scheduled onto the CPU.
The initial part of this graph shows the machines all at idle. Now you’d think at that stage that their ready times would be zero since there’s no load on the server. However since VMware’s hypervisor knows when a virtual machine is idle it won’t schedule it on as often as the idle loops are simply wasted CPU cycles. The jumpy period after that is when I was starting up a couple virtual machines at a time and as you can see those virtual machine’s ready times drop to 0. The very last part of the graph shows the ready time rocketing down to nothing for all the virtual machines with the top grey part of the graph being the ready time of the hypervisor itself.
This test doesn’t show anything revolutionary as this is pretty much the expected behaviour of a virtualized system. It does however provide us with a solid baseline from which we can draw some conclusions from further tests. The next test I performed was to see what would happen when I doubled the work load on the server, increasing the virtual core count from 12 to a whopping 24.
For comparison’s sake the first graph’s peak is equivalent to the first peak of the second graph. What this shows is that when the CPU is oversubscribed by 100% the CPU wait times rocket through the roof with the virtual machines waiting up to 10 seconds in some cases to get scheduled back onto the CPU. The average was somewhere around half a second which for most applications is an unacceptable amount of time. Just imagine trying to use your desktop and having it freeze for half a second every 20 seconds or so, you’d say it was unusable. Taking this into consideration we now know that there must be some level of happy medium in the centre. The next test then aimed right bang in the middle of these two extremes, putting 18 CPUs on a 12 core host.
Here’s where it gets interesting. The graph depicts the same test running over the entire time but as you can see there are very distinct sections depicting what I call different modes of operation. The lower end of the graph shows a time when the scheduler is hitting bang on its scheduling and the wait times are overall quite low. The second is when the scheduler gives much more priority to the virtual machines that are thrashing their cores and the machines that aren’t doing anything get pushed to the side. However in both instances the 18 cores running are able to get the serviced in a maximum of 20 milliseconds or so, well within the acceptable range of most programs and user experience guidelines.
Taking this all into consideration it’s then reasonable to say that the maximum you can oversubscribe a virtual host in regards to CPU is 1.5 times the number of physical cores. You can extrapolate that further by taking into consideration the average load and if it’s below 100% constantly then you can divide the number of CPUs by that percentage. For example if the average load of these virtual machines was 50% then theoretically you could support 36 single core virtual machines on this particular host. Of course once you get into the very high CPU count things like overhead start to come into consideration, but as a hard and fast rule it works quite well.
If I’m honest I was quite surprised with these results as I thought once I put a single extra thrashing virtual machine on the server it’d fall over in a screaming heap with the additional load. It seems though that VMware’s scheduler is smart enough to be able to service a load much higher than what the server should be capable of without affecting the other virtual machines that adversely. This is especially good news for virtual desktop deployments as typically the limiting factor there was the number of CPU cores available. If you’re an administrator of a virtual deployment I hope you found this informative and it will help you when planning future virtual deployments.
¹CPU ready time was chosen as the metric as it most aptly showcases a server’s ability to serve a virtual machine’s request of the CPU when in a heavy scheduling scenario. Usage wouldn’t be an accurate metric to use since for all these tests the blade was 100% utilized no matter the number of virtual machines running.
It’s late 2005, I had just sworn off World of Warcraft forever and I had begun reintegrating myself with the world of gaming outside the single window I had been staring at for just over a year. On the horizon was a promise of something new, something revolutionary. It was the Playstation 3 something that had been rumoured about for the longest time and was finally beginning to take shape. Best of all some of the launch titles were beginning to trickle into the news stories and they promised the world to us with visuals and games unlike any of those that came before. One of those games was White Knight Chronicles a RPG that showcased a beautiful menu system, battles on scales of the most epic proportions and a story to bind it all together.
I was hooked. That game would be mine.
There I was on the PS3 launch night, sitting in the Belconnen Westfield food court with 50 or so like minded souls waiting for EB Games to open so I could get my console. My ever patient fiancée (now wife) was sitting by my side patiently knowing why I had to be there at this ungodly hour. The doors finally opened I and since there was a queue system for getting the consoles out of there in an orderly fashion I took it upon myself to check out the games available. I mean there was no point to getting a console without a game right? Problem was I couldn’t find the prize I had been yearning for, White Knight Chronicles was just no where to be found. I could have sworn I heard someone saying they picked up a copy but talking with my friends it seems I may have just misheard someone buying Fight Night.
I returned home, confused.
The next day was filled with Internet searches, forum posts and fleeting conversations with friends. Finally I came across some articles saying that White Knight Chronicles was going to be released in Japan before anywhere else and that had been delayed until some unknown time in the future. I sunk back into my chair defeated feeling like something great had been snatched from my grasp. Still the fervent excitement I felt didn’t let go and I spent the next two years devouring every little detail I could find about the game. Eventually the game was released in 2008 to the Japanese market and I knew that it wouldn’t be long before I had it in my grasp.
A year and 2 months passed before I was able to get my hands on the game. It was amongst several other large releases at the time so I didn’t get it straight away but every time I walked in to pick up another title I’d see it on the shelves, tempting me with thought of fulfilling promises long forgotten. A month or so passed and I returned from my blockbuster gaming binge and couldn’t resist any longer, I bought the game and took it home not wasting any time before I placed that magical disc into my now 4 year old Playstation 3. The menu came up and I started playing but something was wrong.
Almost 45 minutes passed before I actually got to play the game. This wasn’t all for patching or firmware updates, those took less than 10 minutes, no the game took me through so many in game cinematics that I wasn’t allowed to actually do anything until they were done. The next 30 minutes were filled with me running crazily through the town trying to figure out where I needed to be. Finally I found the mission and was sent to another town which I had to make my way to through a forest filled with possible enemies. 2 hours later I discovered what the game was, it was a single player version of World of Warcraft and one that was none too good at that.
I was devastated, the game that had been hyped so much in my head for the past 5 years turned out to be a turd. I tried several times to play it again but there just wasn’t anything interesting about the game that could keep me coming back. I put the game in the drawer and resigned myself to forget about it and resolved myself to never, ever get so drawn into the hype ever again lest I be caught in a devilish web like the one White Knight Chronicles had spun for me.
So now whenever something is announced or hyped I usually don’t go much deeper into it than the basic facts like it’s release date and who is developing it. White Knight Chronicles wasn’t the only game to be ruined (wholly or in part) by its hype, Modern Warfare 2’s “shocking” scene was almost utterly lost on me because of all the talk about it. Sure there are plenty of games I get really excited about (Mass Effect 3 for example) but apart from knowing they’re being developed and should be awesome I don’t trouble myself with all the details lest they fall short of my crazy expectations. This means I may miss a few things but in the long run I get to play the games with fewer preconceptions so the games can stand by themselves, as I believe they should.
Was I solely to blame for getting too caught up in the hype? Most definitely. Had I adopted my current regime of letting the hype slide until after I’d played the game I may have lasted long enough for White Knight Chronicles to shine and instead you’d be reading a review of it rather than a rant. Still I believe I’m better served by this minimalist approach and realistically it was only a matter of time before I got so caught up in something that the above story would’ve happened again. So if I seem disinterested when you’re really excited about a game it’s nothing personal, I just want to make sure the game doesn’t ruin itself before I’ve had the chance to play it.