Posts Tagged‘problems’

Goodbye Spirit, We’ll See You Soon.

Bar the shuttle there’s only been one mission in recent memory that has managed to capture the attention and imagination of nearly the entire world. That mission is the Mars Exploration Rovers, a pair of plucky little explorers that touched down on Mars almost 7 years ago today beginning a truly epic journey that lasted well past their expected lifetime. They also hold the crown of being conceived, built, launched and spending the better part of a decade on one of our closest neighbours in the universe in the time that it has taken Duke Nukem Forever to be developed. Their impact on the world and our understanding of the universe cannot be understated and it is with a sadden heart that I bring you this news today.

Spirit, formally known as MER-A, has ceased all communications with earth for too long and we will no longer be attempting to contact it.

Even though they were, for all intents and purposes, identical twins Spirit always had the hardest time on our red sister. For the first couple years they were both chugging along quite well but in mid March 2006 Spirit’s front right wheel locked up and failed to respond. This meant that for most of its life Spirit was driving around backwards, dragging the dead wheel behind it. It was both a blessing and a curse to the little rover as the dragging meant it could image the crevices it was leaving behind, providing some insight that we weren’t expecting. There was a brief moment of excitement when the wheel began to respond again, but it soon stopped responding shortly after. The rear right wheel also suffered a similar fate several years later.

Then in 2009 Spirit became stuck in a soft patch of Mars soil. At the time it didn’t seem like a big of a deal, they’d been in similar situations before with both rovers and managed to free them successfully, but this one presented some major challenges. The soil was an insidious creation of mostly iron sulfate which has poor cohesion and is like quick sand to the rover’s wheels. NASA then spent 9 months testing various scenarios on earth in a desperate attempt to free the craft before the harsh martian winter before giving up and declaring Spirit a stationary research station.

With the rover stuck in the soil it was unable to orient its solar panels to a favourable angle in order to generate enough electricity to keep its components warm during Mars’ winter. This meant that once that time came it was likely that the rover’s electronics would be subjected to temperatures far below what it was designed to handle, likely killing it in the process. It’s the same problem that faced the Phoenix Lander and the unfortunate truth is that it didn’t survive the winter. Spirit went dark on March 22, 2010 and all attempts to contact it since then have been met silence. This means that the rover is no longer functioning, frozen in its final resting place.

Spirit may no longer be communicating with us but its mission lives on in its twin, Opportunity, and it’s future incarnation in the Mars Science Laboratory called Curiosity. There’s also the very real possibility that SpaceX will be launching a mission to Mars in the near future and that gives us the very real possibility that us humans could be meeting up with our robotic creations much sooner than we think. So while writing this article brought a tear to my eye I know that Spirit won’t be alone in the Martian soil for long and we’ll be seeing it again very soon.

So long Spirit.

Shit’s Breaking Everywhere, Captain.

So it turns out that my blog has been down for the last 2 days and I, in my infinite wisdom, failed to notice this. It seems like no matter how I set this thing up it will end up causing some problem that inveitably brings the whole server to its knees, killing it quietly whilst I go about my business. Now this isn’t news to anyone who’s read my blog for any length of time but it eerily coinciding with my main machine “forgetting” it’s main partition, leaving me with no website and a machine that refused to boot.

Realistically I’m a victim of my own doing since my main machine is getting a bit long in the tooth (almost 3 years now by my guess) but even before it hit the 6 month mark I was getting problems. Back then it was some extremely obscure issue that only seemed to crop up in certain games where I couldn’t get more than 30 seconds into playing them before the whole machine froze and repeatedly played the last second of sound until I pulled the plug on it. That turned out to be RAM requiring more volts than it said it did and everything seemed to run fine until I hit a string of hard drives that magically forgot partitions (yes, in much the same fashion as my current one did). Most recently it has taken to hating having all of its RAM slots filled even though both of them work fine in isolation. Maybe it’s time this bugger went the way of old yeller.

Usually a rebuild isn’t much of a hassle for someone like me. It’s a pain to be sure but the pay off at the end is a much leaner and meaner rig that runs everything faster than it did before. This time around however it also meant configuring my development environment again whilst also making sure that all my code didn’t suffer in the apparent partition failure. I’m glad to say that whilst it did kill a good couple hours I was otherwise planning to spend lazing about I have got everything functional again and no code was harmed in the exercise.

You might be wondering why the hell I’m bother to post this then since it’s so much of a non-event. Well for the most part it’s to satisfy that part of me that likes to blog every day (no matter how hard I try to quell him) but also it’s to make sure the whole thing is running again and that Google is aware that my site hasn’t completely disappeared. So for those of you who were expecting something more I’m deeply sorry, but until the new year comes along I’m not sure how much blogging I’m going to be doing. Let alone any well thought out pieces that I tend to hit at least a couple times a week 😉

Don’t Anthropomorphize Computers, They Hate it When You do That.

My parents always used to tell me that bad things came in threes. When I thought about it there was always 2 other bad things that would’ve happened around the same time so it seemed to make sense. Of course it’s just a convenient way of rationalising away coincidences as something bad will always end up happening to you and the rule is so loose that those three things could cover quite a large time period. Still yesterday seemed to be one of those days where I had at least three things go completely tits up on me in quick succession, sullying what would have been otherwise quite a cheerful day. The common thread of this whole debacle was of course computers; the one thing I get paid to be an expert on are most often the cause of my troubles.

The day started off pretty well. My MacBook Pro had arrived yesterday and I cheerfully went down to the depot to pick it up. A quick chat and a signature later I had my shiny new toy which I was all too eager to get my hands onto. There was enough for me to do at work that I wasn’t completely bored yet I had enough time to not feel pressured about anything. A few good emails from close friends ensured that my not-so-secret project was on track to actually be useful to some people, rather than just me deluding myself into thinking that. It all came undone about 15 minutes before I was about to leave work and cascaded on from there.

Part of the environment I’m responsible for went, for lack of a better word, completely berko. People couldn’t access machines whilst others just refused to start. After spending an hour trying various solutions I knew that I wouldn’t solve this problem within the next 3 hours so I decided to set up some things that would hopefully get the system to rectify itself and ran out the door as quickly as I could. After getting stuck in traffic for nearly an hour I was finally at home and ready to unbox my prize that I had been waiting a long time for, and it was well worth it.

Whilst I’ll do a full review of the MacBook Pro a little later (once I’ve got to know it better) I will say that it’s quite a slick piece of hardware. After fooling around in OSX for all of 20 minutes I fired up BootCamp and started the unholy process of installing Windows 7 on it. To Apple’s credit this process was quite smooth and in under an hour from first unboxing it I was up and running without a single hiccup along the way. After declaring that a success I decided that I should reward myself with a little Starcraft 2, and that’s when my PC got jealous.

You see I have a rather… chequered record when it comes to my personal PCs. They almost always have their quirks in one way or another, usually from me either doing something to them or not bothering to fix or replace a certain piece of hardware. My current desktop is no exception and up until recently it randomly suffered from a hard drive that would erase the MBR every so often along with being slow as a wet dog in molasses. Before that it was memory problems that would cause it lock up not 10 seconds into any game and before that it was a set of 8800GTs that would work most of the time then repeatedly crash for no apparent reason. Anyone who talked to me about it knew I had a habit of threatening the PC into working which seemed to work surprisingly often. I wasn’t above parading around the gutted corpses of its former companions as a warning to my PC should it not behave, much to the puzzlement of my wife.

For the most part though the last couple months have been pretty good. Ever since upgrading the drives in my PC to 2 Samsung Spinpoint F3s (faster than Raptors and cost almost nothing) I’ve had a pretty good run with the issues only being software related. The past few days though my PC has decided to just up and shut itself down randomly without so much of a hint as to what went wrong. Initially I thought it was overheating so I upped the fan speeds and everything seemed to run smoothly again. Last night however saw the same problem happen again (right in the middle of a game no less) but the PC failed to recover afterwards, not even wanting to POST.

You could say that it was serendipitous that I managed to get myself a new laptop just as my PC carked it but to me it just feels like my trouble child PC throwing a jealousy fit at the new arrival in the house. My server and media PC both know that I won’t take any of that sort of shenanigans from them as I’ll gut one of them to fix the other should the need arise. My PC on the other hand seems to know that no matter how much shit it drags me through I’ll always come crawling back with components in hand, hoping to revive it.

My house is a testament to that adage that a mechanic’s car will always be on the verge of breaking down. My PC deciding to die last night was frustrating but it then also let me indulge in some good old fashion hardware ogling, filling my head with dreams of new bits of hardware and what joys they may bring. My quick research into the problem has shown there will probably be an easy fix so it’s not all bad. Still at 10:00PM last night part of my head was still screaming the rule of three at me, but I managed to drown that out with some good beer and an episode of Eureka.

Now to prepare the sacrificial motherboard for the ritual tonight… 😀

We Have the most Interesting Problems.

No matter what you do you’ve got to have a bit of pride in what you’re doing. I’d love to tell everyone that my sense of pride in my work comes from my long line of successful projects, which I will admit do give me a warm and fuzzy feeling, but more and more I think it comes down to this: Give me any IT system known to man, be it a personal computer or corporate infrastructure, and guaranteed I’ll find a problem that no one has ever seen before and won’t even try to explain.

This came up recently with our blade implementation I mentioned a while ago. Everything has been going great, with our whole environment able to run on a single blade comfortably. Whilst I was migrating everything across something happened that managed to knock one of our 2 blades offline. No worries I thought to myself, I had enabled HA on the farm so all the virtual machines would magically reappear. Not 2 minutes later did our other blade server drop off the network, taking all the (non-production, thank heavens) servers offline. After spending a lot of time on getting this up and running I was more than a little irked that it had developed a problem like this, but I endeavoured to find the cause.

That was about 2 weeks ago and I thought I had nipped it in the bud when I had found the machines responsible and modified their configuration so they’d behave. I was working on reconfiguring some network properties on one machine when I suddenly lost connection again. Knowing that this could happen I had made sure to move most of the servers off before attempting this so we didn’t lose our entire environment this time around. However what troubled me wasn’t the blade dropping off the network it was how I managed to trigger it (a bit of shop talk follows).

VMware’s hypervisor is supposed to abstract the physical hardware away from the guest operating system so that you can easily divvy it up and get more use out of a server. As such it’s pretty rare for a change from within a guest to affect the physical hardware. However when I was changing one network adapter within a guest from a static address (it was on a different subnet prior to migration) to DHCP I completely lost network connectivity to the guest and host. It seems that a funny combination of VMware, HP Blades and Windows TCP/IP stack contains a magic combination so that when you do what I did, the network stack on the VMware host gets corrupted (I’ve confirmed its not the VirtualConnect module or anything else, since I had virtual machines running in the same chassis on a different blade perfectly well).

I’ve struggled with similar things with my own personal computer for years. My current machine suffers from random BSODs that I’m sure are due to the motherboard which is unfortunately the only component I can’t easily replace. Every phone I had for the past 3 years suffered from one problem or another that would render it useless for extended periods of time. Because of this I’ve come to the conclusion that because I’m supposed to be an expert with technology I will inheritly get the worst problems.

It’s not all bad though. With problems like these comes experience. Just like my initial projects which ultimately failed to deliver (granted one of those was a project at University and the other one was woefully under resourced) I learnt what can go wrong where, and had to develop troubleshooting skills to cope with that. I don’t think I’d know a lot about technology today if I hadn’t had so many things break on me. It was this quote that summed it up so well for me:

I’ve missed more than 9,000 shots in my career. I’ve lost almost 300 games. 26 times I’ve been trusted to take the game winning shot and missed. I’ve failed over and over and over again in my life and that is why I succeed.

That quote was from Michael Jordan. A man who is constantly associated with success attributes it to his failures, something which I can attest to. It also speaks to the engineer in me, as with any engineering project the first implementation should never be the one delivered, as revising each implementation lets you learn where you made mistakes and correct them. There’s only so much you can learn from getting it right.

This still doesn’t stop me from wanting to thrash my computer for its dissent against me, however 🙂