Posts Tagged‘automation’

Automating The Configuration of Brocade Interconnects.

Once an IT environment gets past a certain size the requirement for automation grows exponentially. When you’re working in an environment with a handful of servers and a dozen or so PCs its easy to just spend the time doing everything manually. If you’re in a situation like I am right now where a single deployment covers some 400+ physical servers then that process isn’t particularly feasible, especially if you want any level of consistency across the fleet. It should come as no surprise then that I spend the vast majority of my time automating the commissioning of IT infrastructure and since I don’t want to do something 400 times it usually sees me trying to automate things that really don’t want to be automated.

Dell 8 4 FC SAN Fibre Interconnect Module

Take for instance this little fellow, a Dell 8/4 Fibre Channel Interconnect module for a M1000e chassis (sounds sexy, right?). Don’t let that Dell badge on the outside fool you, like a lot of Dell hardware it’s actually a rebranded Brocade fibre switch under the hood, albeit with a significantly paired down feature set. For the most part it’s just a dumb NPIV device that acts as a pass through for high speed connections but it does have a little bit of smarts in it, enough so it would typically come under the purview of your on site storage team. However due to its paired down nature it doesn’t work with any of Brocade’s management software (at least none that we have here) and so the storage team wasn’t particularly interested in managing it. Fair cop but there was still a few things that needed to be configured on it, so my colleague and I set about figuring out how to do that.

Usually this is when I’ll track down the CLI or automation guide for the particular product and then dig around for the commands I need in order to get it configured. Try as I might I couldn’t find anything from Brocade themselves as they usually recommend using something like DCFM for configuration. There is a SSH interface on the devices however which does have a rather comprehensive set of commands in it but there didn’t appear to be any way to get at these remotely. We could, of course, use something like TCL with EXPECT to essentially automate this process but that’s traditionally quite messy so we asked our on site resident from Brocade if there was a better solution.

There isn’t, apparently.

So off we went building up a TCL file that would do the configuration for us and initially it all worked as expected (pun completely unintentional I assure you). Our test environment worked every time once we had all the initial kinks worked out of the script and we were confident enough to start moving it up the chain. Of course this is when problems start to become apparent and during our testing we began to find some really weird behaviours coming from the switches, things that aren’t mentioned anywhere nor are obvious unless you’re doing exactly what we’re doing.

So in order to build up the original TCL script file I’d PuTTy into one of the switches and execute the command. Then once I had confirmed the changes I wanted to be done had been done I’d then put them into the script. Pretty standard stuff but after re-running the scripts I’d find they inexplicably fail at certain points, usually when attempting to reconfigure a switch that had already been deployed. Essentially I’d look for a “Access denied” message after trying the default password and then send along the correct one afterwards as that’s all that was required when using PuTTy.

However looking at the logs not only is the message it sends back different, saying “Login incorrect”, it also doesn’t just ask for the correct password it also requests the user name again. There are also significant differences in the way output is written between the two interfaces which means for things like EXPECT you have to code around them otherwise you’ll end up trying to send input at wrong times and read lines that you might not want to. It’s clear that there’s 2 interfaces to the Brocade switches and they differ enough between each other to make coding against one incompatible with the other which is just unacceptable.

Realistically what’s required is for Brocade to release some kind of configuration tool like Dell’s RACADM which provides a direct hook into these devices so they can be automated properly. I’ve found old forum posts that reference something like that for Perl but as far as I, and the Brocade people I’ve talked to, am aware there’s nothing like that available for these particular devices. It’s not like its impossible to code EXPECT up to do what we want it to but it’s ugly, unmaintainable and likely to break with firmware updates. If there is a better solution I’d love to hear it but after all the time I have invested in this I’m pretty sure there isn’t one.

Unless Brocade has something in the works, nudge nudge 😉

The Quirks of Qlogic’s QAUCLI Tool (and The Perils of Shutting Down DOS).

Probably the biggest part of my job, and really it should be the biggest part of any competent administrator’s job, is automation. Most often system administrators start out in smaller places, usually small businesses or their own home network, where the number of machines under their control rarely exceeds single digits. At this point its pretty easy to get by with completely manual processes and indeed it’s usually much more efficient to do so. However things change rapidly as you come into environments with hundreds if not thousands of end points that require some kind of configuration to be done on them and at that point its just not feasible to do it manually any more. Thus most of my time is spent finding ways to automate things and sometimes this leads me down some pretty deep rabbit holes.

Take for instance the simple task of updating firmware.

You’d probably be surprised to find out that despite all the advances in technology over the decades firmware updates are still done through good old fashioned DOS, especially if you’re running some kind of hypervisor like VMware’s ESXi. For the most part this isn’t necessarily a bad thing, DOS is so incredibly well known that nearly all the problems you come across have a solid solution for it, but it does impose a lot of limitations on what you can do. For me the task was simple: the server needed to boot up, update the required firmware and then shut down at the end sop my script would know that the firmware update had completed successfully. There were other ways of doing this, like constantly querying the firmware version until it showed the updated status, but shutting down at the end would be far quicker and much more reliable (the firmware versions returned aren’t always 100% accurate). Not a problem I thought, the DOS CD I had must contain some kind of shut down command that I can put in AUTOEXEC.BAT and we’ll be done in under an hour.

I was utterly, utterly wrong.

You see DOS comes from the day when power supplies were much more physical things than they are today. When you went to turn your PC on back then you’d flip a large mechanical switch, one that was directly wired to the power supply, that’d turn on with an audible clack. Today the button you press isn’t actually connected to the power supply directly it’s connected to the motherboard and when the connection is closed it sends a signal (well it shorts 2 pins) to turn it on. What this means is that DOS really didn’t have any idea about shutting down a system since you’d just yank the power out from underneath it. This is the same reason that earlier versions of Windows gave you that “It’s now safe to turn off your computer” message, the OS simply wasn’t able to communicate to the power supply.

There are of course a whole host of third party solutions out there like this shutdown.com application, FDAPM from the FreeDos guys and some ingenious abuse of the DOS DEBUG command but unfortunately they all seemed to fail when presented with Dell hardware. As far as I can tell this is because the BIOS on the Dell M910 isn’t APM aware which means the usual way these applications talk to the power supply just won’t work (FDAPM reports this as such) which leaves us with precious few options for shutting down. Frustrated I decided that DOS might not be the best platform for updating the firmware and turned towards WinPE.

WinPE is kind of like a cut down version of Windows (available for free by the way)  that you can boot into, usually used to deploy the operating system in large server and desktop fleets. By cut down I mean really cut down, the base ISO it creates is on the order of 140MB, meaning if you need anything in there you basically have to add it in yourself. After adding in the scripting framework, drivers for the 10GB Ethernet cards and loading the QAUCLI tool I found in the Windows version of the firmware update I thought it would be a quick step of executing a command line and we’d be done.

Turns out QAUCLI is probably closer to an engineering tool in development more than a production level application. Whilst it may have some kind of debug log somewhere (I can’t for the life of me find it and the user guide doesn’t list anything) I couldn’t find any way to get it to give me meaningful information on what it was doing, whether it was encountering errors or if I had executed the command incorrectly. The interactive portion of it is quite good, in fact its almost a different tool when used interactively, but the scripted section of it just doesn’t seem to work as advertised.

Here’s a list of the quirks I came across (for reference the base command I was trying to use was qaucli -pr nic -svmtool mode=update fwup=p3p11047.bin):

  • Adding output=stdout as an option will make the tool fail regardless of any other option.
  • There is no validation on whether the firmware file you give it exists or not, nor if the firmware file itself is valid.
  • Upgrading/downgrading certain firmware versions will fail. I was working with some beta firmwares that were supposed to fix a client issue which could have likely been the cause but doing the same action interactively worked.
  • There is no feedback as to whether the command worked or failed past the execution time. If it fails to update it takes about a minute to finish, if it works its closer to 3~5 minutes.
  • Windows seems to be able to talk to some Qlogic cards natively (the QME2572  fibre channel cards specifically) but not the 10GB cards. This is pretty typical as ESXi needs a driver to talk to these cards as well so its not much of a quirk of QAUCLI per se, more that you need to be aware that if you want to flash the firmware on them in a WinPE environment you need to inject the drivers into the image.

Honestly though it could very well be my fault for tinkering with an executable that I probably shouldn’t be. Try as I might to find a legitimate download for QAUCLI I can’t really find one and the only place you’ll be able to get it is by extracting the Windows installer package and pulling it out of there. Still it’s a valuable tool and one that I think could be a lot better than it currently is but if you find yourself in a situation like I did hopefully these little tips will save you some frustration.

I know I would’ve appreciated them 3 days ago 😉

Powershell: Why Did I Resist?

A good deal of any system administrators job is automation. Even when you’re working in small environments doing the same thing on every user’s machine individually is needlessly tiresome and always error prone. My current environment has well over 400 servers and at least 1000 desktops so anything that needs to touch all of them has to be automated, there’s just no other option. In the past VBScript was the be all and end all of Windows based scripting and is still used as the de facto automation language for many IT shops today. However with the coming of Vista and Server 2008 we saw the introduction of a plucky new tool called Powershell (first seen in the wild in 2006) which looked to be the next greatest thing for automating your IT environment. Due to Vista’s poor reception and by association Server 2008 Powershell didn’t really take off that well. In fact I’d actively ignored it up until about 6 months ago when I started looking at it more closely as a tool to automate some VMware tasks. Little did I know then that this new world of Powershell would soon make up the majority of my day to day work.

Now the developers out there will know that Visual Basic (VB) is somewhat of a beginner’s programming language. Sure it’s feature complete when compared to its bigger brother C# however it’s rather lax with its standards and this makes any code done in VB rather inelegant. This was probably why I shied away from Powershell initially as I thought it would just be an evolutionary step from VBScript, but I couldn’t have been more wrong. The syntax is decidedly closer to C# than VB although the legacy of behind the scenes tricks to hide some complexities from its users is still there, although with the added benefit of those small tricks being available should you know where to look. Additionally the ease of integration with other Microsoft coding platforms (like loading .NET dlls) is absolutely amazing, giving you the power of doing almost anything you can with their other languages right there in your script.

The real kicker though is the shift in focus that Microsoft has taken when it implemented Powershell all those years ago. Typically their infrastructure products like Exchange or System Center were either built by separate teams or came from another company that Microsoft had purchased. This meant that there was no standard way of interfacing with these products making automation a real pain, usually ending up with you having to use a third party tool or write reams of VBScript. For most future releases however Microsoft has built their management tools on top of Powershell, meaning that any action performed in the management consoles can be replicated via a script. This was most obvious when they released Exchange 2007 and any command you performed on the GUI would show you the Powershell command that it ran.

To show you how much you can do with Powershell I’m going to include 2 of my own scripts which I invested quite a bit of time in. The first shown below is a script that will scan your domain and alert you when someone adds themselves to the Domain Administrators group:

$domainAdmins = dsget group “CN=Domain Admins,CN=Users,DC=your,DC=domain,DC=com” -members -expand
$list = Get-Content C:\Directory\Where\Script\Runs\DomainAdminsList.txt

$mail = new-object System.Net.Mail.MailMessage
$mail.From = new-object System.Net.Mail.MailAddress(“[email protected]“)
$mail.To.Add(“[email protected]“)
$smtpserver = “YourSMTPServer
$mail.Subject = “Unauthorized Domain Administrator Priveleges Detected.”
$smtp = new-object System.Net.Mail.SmtpClient($smtpserver)

foreach ($domainAdmin in $domainAdmins)
{
$found = $false
foreach ($line in $list)
{
if ($domainAdmin -eq $line){$found = $true}
}

if ($domainAdmin -eq “”){$found = $true}

if($found){}
else
{
$date = Get-Date
$hostname = hostname
Write-Host $domainAdmin “not found in control file.”
$mail.Body = $domainAdmin + ” not found in control file. Script run on ” + $hostname +” at ” + $date + ” using control file C:\Directory\Where\Script\Runs\DomainAdminsList.txt
$smtp.Send($mail)
}
}

You’ll want to first run “dsget group “CN=Domain Admins,CN=Users,DC=your,DC=domain,DC=com” -members -expand | DomainAdminsList.txt” to generate the text file of domain admins. Once you’ve done that you can schedule this to run say every hour or so and you’ll get an email when someone gives an account domain administrator. You can modify this for any group to, just update the first line with the CN of the group you want to scan.

The second is one that I’m quite proud of, it will tell you when someone changes a group policy in your domain. Pretty handy for when you’ve got a bunch of developers who have access to do that and routinely break other people’s systems when they do. You’ll need to grab the ListAllGPOs.ps1 script from here first (although I called it GPOList.ps1):

$GPOs = .\GPOList.ps1 -query -verbose -domain your.domain.com

$DCs = “DC01″,”DC02”

$baseline = Import-Csv GPOBaseline.csv

$outFile = “D:\Apps\Scripts\GPOScanner\Output.txt”
$outBody = “D:\Apps\Scripts\GPOScanner\OutBody.txt”
$null | Out-File $outFile
$null | Out-File $outBody
$emailRequired = $false

Write-Host “Scanning your.domain.com
your.domain.com” | Out-File $outFile -append
foreach ($cGPO in $devGPOs)
{
$found = $false
foreach ($bGPO in $baseline)
{
if ($cGPO.ID -match $bGPO.ID)
{
$found = $true

if ($bGPO.ModificationTime.Equals($cGPO.ModificationTime.ToString()))
{}
else
{
$output = “WARNING: GPO ” + $cGPO.Displayname + ” has been modified since baseline.”
Write-Host $output
$output | Out-File $outBody -append
$output = “Modification time: ” + $cGPO.ModificationTime + “”
Write-Host $output
$output | Out-File $outBody -append
$emailRequired = $true

$cGPO.ModificationTime.AddSeconds(1).ToString()
foreach ($dc in $DCs)
{
$dc
$logs = [System.Diagnostics.EventLog]::GetEventLogs($dc)
foreach($log in $logs)
{
if($log.LogDisplayName -eq “Security”)
{
$entries = $log.Entries
foreach($entry in $entries)
{
if ($entry.EventID.Equals(4663) -or $entry.EventID.Equals(4656) -or $entry.EventID.Equals(560))
{
if ($entry.Message.Contains($cGPO.ID))
{
$entry | fl
$entry | fl | Out-File $outFile -append
}
}
}
}
}
}
}
}
}

if ($found -eq $false)
{
$emailRequired = $true
$output = “New GPO ” + $cGPO.DisplayName + ” not found in baseline.”
Write-Host $output
$output | Out-File $outBody -append
}
}

if ($emailRequired)
{
$hostname = hostname
$date = Get-Date
$output = “Script was run on ” + $hostname + ” at ” + $date + ” using control files located in D:\Apps\Scripts\GPOScanner. Please see the attachment for related event log information.”
$output | Out-File $outBody -append
$mail = new-object System.Net.Mail.MailMessage
$mail.From = new-object System.Net.Mail.MailAddress(“[email protected]“)
$mail.To.Add(“[email protected]”)
$smtpserver = “YourSMTPServer
$mail.Subject = “Group Policy Changes Detected.”
$smtp = new-object System.Net.Mail.SmtpClient($smtpserver)
$mail.Body = Get-Content $outBody
$att = new-object Net.Mail.Attachment($outFile)
$mail.Attachments.Add($att)
$smtp.Send($mail)
$att.Dispose()
}

Again you’ll want to run “.\GPOList.ps1 -query -verbose -domain your.domain.com | Export-Csv GPOBaseline.csv” to generate the baseline. This script will first look for any changes then scour the security logs of your domain controllers to find who did it, sending you the logs of who changed it and when. Pretty neat eh?

$GPOs = .\GPOList.ps1 -query -verbose -domain your.domain.com

$DCs = “DC01″,”DC02”

$baseline = Import-Csv CENTRALGPOBaseline.csv

$outFile = “D:\Apps\Scripts\GPOScanner\Output.txt”
$outBody = “D:\Apps\Scripts\GPOScanner\OutBody.txt”
$null | Out-File $outFile
$null | Out-File $outBody
$emailRequired = $false

Write-Host “Scanning your.domain.com”
“your.domain.com” | Out-File $outFile -append
foreach ($cGPO in $devGPOs)
{
$found = $false
foreach ($bGPO in $baseline)
{
if ($cGPO.ID -match $bGPO.ID)
{
$found = $true

if ($bGPO.ModificationTime.Equals($cGPO.ModificationTime.ToString()))
{}
else
{
$output = “WARNING: GPO ” + $cGPO.Displayname + ” has been modified since baseline.”
Write-Host $output
$output | Out-File $outBody -append
$output = “Modification time: ” + $cGPO.ModificationTime + “”
Write-Host $output
$output | Out-File $outBody -append
$emailRequired = $true

$cGPO.ModificationTime.AddSeconds(1).ToString()
foreach ($dc in $DCs)
{
$dc
$logs = [System.Diagnostics.EventLog]::GetEventLogs($dc)
foreach($log in $logs)
{
if($log.LogDisplayName -eq “Security”)
{
$entries = $log.Entries
foreach($entry in $entries)
{
if ($entry.EventID.Equals(4663) -or $entry.EventID.Equals(4656) -or $entry.EventID.Equals(560))
{
if ($entry.Message.Contains($cGPO.ID))
{
$entry | fl
$entry | fl | Out-File $outFile -append
}
}
}
}
}
}
}
}
}

if ($found -eq $false)
{
$emailRequired = $true
$output = “New GPO ” + $cGPO.DisplayName + ” not found in baseline.”
Write-Host $output
$output | Out-File $outBody -append
}
}

if ($emailRequired)
{
$hostname = hostname
$date = Get-Date
$output = “Script was run on ” + $hostname + ” at ” + $date + ” using control files located in D:\Apps\Scripts\GPOScanner. Please see the attachment for related event log information.”
$output | Out-File $outBody -append
$mail = new-object System.Net.Mail.MailMessage
$mail.From = new-object System.Net.Mail.MailAddress(“[email protected]”)
$mail.To.Add(“[email protected]”)
$smtpserver = “YourSMTPServer”
$mail.Subject = “Group Policy Changes Detected.”
$smtp = new-object System.Net.Mail.SmtpClient($smtpserver)
$mail.Body = Get-Content $outBody
$att = new-object Net.Mail.Attachment($outFile)
$mail.Attachments.Add($att)
$smtp.Send($mail)
$att.Dispose()
}