Posts Tagged‘chassis’

External GPUs are a Solution in Search of a Problem.

If you’re a long time PC gamer chances are that you’ve considered getting yourself a gaming laptop at one point or another. The main attraction from such a device is portability, especially back in the heydays of LANs where steel cases and giant CRTs were a right pain to lug around. However they always came at a cost, both financially and opportunity as once you bought yourself a gaming laptop you were locked into those specs until you bought yourself another one. Alienware, a longtime manufacturer of gaming laptops, has cottoned onto this issue and has developed what they’re calling the Graphics Amplifier in order to bring desktop level grunt and upgradeability to their line of laptops.

Alienware Graphics Amplifier

On the surface it looks like a giant external hard drive but inside it are all the components required to run any PCIe graphics card. It contains a small circuit board with a PCIe x16 slot, a 450W power supply and a host of other connections because why not. There’s no fans or anything else to speak of however so you’re going to want to get a card with a blower style fan system on it, something which you’ll only see on reference cards these days. This then connects back to an Alienware laptop through a proprietary connection (unfortunately) which then allows the graphics card to act as if it’s installed in the system. The enclosure retails for about $300 without the graphics card included in it which means you’re up for about $600+ if you’re going to buy one for it. That’s certainly not out of reach for those who are already investing $1800+ in the requisite laptop but it’s certainly enough to make you reconsider the laptop purchase in the first place.

You see whilst this external case does appear to work as advertised (judging by the various articles that have popped up with it) it essentially removes the most attractive thing about having a gaming capable laptop: the portability. Sure this is probably more portable than a mini tower and a monitor but at the same time this case is likely to weigh more than the laptop itself and won’t fit into your laptop carry bag. The argument could be made that you wouldn’t need to take this with you, this is only for home use or something, but even then I’d argue you’d likely be better off with a gaming desktop and some slim, far more portable laptop to take with you (both of which could be had for the combined cost of this and the laptop).

Honestly though the days have long since passed when it was necessary to upgrade your hardware on a near yearly basis in order to be able to play the latest games. My current rig is well over 3 years old now and is still quite capable of playing all current releases, even if I have to dial back a setting or two on occasion. With that in mind you’d be better off spending the extra cash that you’d sink into this device plus the graphics card into the actual laptop itself which would likely net you the same overall performance. Then, when the laptop finally starts to show its age, you’ll likely be in the market for a replacement anyway.

I’m sure there’ll be a few people out there who’ll find some value in a device like this but honestly I just can’t see it. Sure it’s a cool piece of technology, a complete product where there’s only been DIY solutions in the past, but it’s uses are extremely limited and not likely to appeal to those who it’ll be marketed too. Indeed it feels much like Razer’s modular PC project, a cool idea that just simply won’t have a market to sell its product to. It’ll be interesting to see if this catches on though but since Alienware are the first (and only) company to be doing this I don’t have a high hopes.

Virtual Machine CPU Over-provisioning: Results From The Real World.

Back when virtualization was just starting to make headway into the corporate IT market the main aim of the game was consolidation. Vast quantities of CPU, memory and disk resources were being squandered as servers sat idle for the vast majority of their lives, barely ever using the capacity that was assigned to them. Virtualization allowed IT shops the ability to run many low resource servers on the one box, significantly reducing the hardware requirement cost whilst providing a whole host of other features. It followed then that administrators looked towards over-provisioning their hosts, I.E. creating more virtual machines than the host was technically capable of handling.

The reason this works is because of a feature of virtualization platforms called scheduling. In essence when you put a virtual machine on an over-provisioned host it will not be guaranteed to get resources when it needs them, instead it’s scheduled on and in order to keep it and all the other virtual machines running properly. Surprisingly this works quite well as for the most part virtual machines spend a good part of their life idle and the virtualization platform uses this information to schedule busy machines ahead of idle ones. Recently I was approached to find out what the limits were of a new piece of hardware that we had procured and I’ve discovered some rather interesting results.

The piece of kit in question is a Dell M610x blade server with the accompanying chassis and interconnects. The specifications we got were pretty good being a dual processor arrangement (2 x Intel Xeon X5660) with 96GB of memory. What we were trying to find out was what kind of guidelines should we have around how many virtual machines could comfortably run on such hardware before performance started to degrade. There was no such testing done with previous hardware so I was working in the dark on this one, so I’ve devised my own test methodology in order to figure out the upper limits of over-provisioning in a virtual world.

The primary performance bottleneck for any virtual environment is the disk subsystem. You can have the fastest CPUs and oodles of RAM and still get torn down by slow disk. However most virtual hosts will use some form of shared storage so testing that is out of the equation. The two primary resources we’re left with then are CPU and memory and the latter is already a well known problem space. However I wasn’t able to find any good articles on CPU over-provisioning so I devised some simple tests to see how the systems would perform when under a load that was well above its capabilities.

The first test was a simple baseline, since the server has 12 available physical cores (HyperThreading might say you get another core, but that’s a pipe dream) I created 12 virtual machines each with a single core. I then fully loaded the CPUs to max capacity. Shown below is a stacked graph of each virtual machine’s ready time which is a representation of how long the virtual machine was ready¹ to execute some instruction but was not able to get scheduled onto the CPU.

The initial part of this graph shows the machines all at idle. Now you’d think at that stage that their ready times would be zero since there’s no load on the server. However since VMware’s hypervisor knows when a virtual machine is idle it won’t schedule it on as often as the idle loops are simply wasted CPU cycles. The jumpy period after that is when I was starting up a couple virtual machines at a time and as you can see those virtual machine’s ready times drop to 0. The very last part of the graph shows the ready time rocketing down to nothing for all the virtual machines with the top grey part of the graph being the ready time of the hypervisor itself. 

This test doesn’t show anything revolutionary as this is pretty much the expected behaviour of a virtualized system. It does however provide us with a solid baseline from which we can draw some conclusions from further tests. The next test I performed was to see what would happen when I doubled the work load on the server, increasing the virtual core count from 12 to a whopping 24. 

For comparison’s sake the first graph’s peak is equivalent to the first peak of the second graph. What this shows is that when the CPU is oversubscribed by 100% the CPU wait times rocket through the roof with the virtual machines waiting up to 10 seconds in some cases to get scheduled back onto the CPU. The average was somewhere around half a second which for most applications is an unacceptable amount of time. Just imagine trying to use your desktop and having it freeze for half a second every 20 seconds or so, you’d say it was unusable. Taking this into consideration we now know that there must be some level of happy medium in the centre. The next test then aimed right bang in the middle of these two extremes, putting 18 CPUs on a 12 core host.

Here’s where it gets interesting. The graph depicts the same test running over the entire time but as you can see there are very distinct sections depicting what I call different modes of operation. The lower end of the graph shows a time when the scheduler is hitting bang on its scheduling and the wait times are overall quite low. The second is when the scheduler gives much more priority to the virtual machines that are thrashing their cores and the machines that aren’t doing anything get pushed to the side. However in both instances the 18 cores running are able to get the serviced in a maximum of 20 milliseconds or so, well within the acceptable range of most programs and user experience guidelines.

Taking this all into consideration it’s then reasonable to say that the maximum you can oversubscribe a virtual host in regards to CPU is 1.5 times the number of physical cores. You can extrapolate that further by taking into consideration the average load and if it’s below 100% constantly then you can divide the number of CPUs by that percentage. For example if the average load of these virtual machines was 50% then theoretically you could support 36 single core virtual machines on this particular host. Of course once you get into the very high CPU count things like overhead start to come into consideration, but as a hard and fast rule it works quite well.

If I’m honest I was quite surprised with these results as I thought once I put a single extra thrashing virtual machine on the server it’d fall over in a screaming heap with the additional load. It seems though that VMware’s scheduler is smart enough to be able to service a load much higher than what the server should be capable of without affecting the other virtual machines that adversely. This is especially good news for virtual desktop deployments as typically the limiting factor there was the number of CPU cores available. If you’re an administrator of a virtual deployment I hope you found this informative and it will help you when planning future virtual deployments.

¹CPU ready time was chosen as the metric as it most aptly showcases a server’s ability to serve a virtual machine’s request of the CPU when in a heavy scheduling scenario. Usage wouldn’t be an accurate metric to use since for all these tests the blade was 100% utilized no matter the number of virtual machines running.