A Week Ago Today...

7/6/2005  < Previous  Next >

A week ago today, I was on a family vacation.  It was our yearly trip involving my parents, my brother’s family, and my own family.  However, was I taking time off and enjoying the trip?  Nooo.  You see, I had just received one of those system-related disturbing voice mail messages that you never want to hear anytime, let alone on vacation.  The message was that one of our servers had crashed – it was the key server hosting ProModules Online, the SSP unlock mechanism, SSP updater support, and email.

 

This server, a Dell PowerEdge 1550, crashed hard taking the drive boot sector with it, apparently the result of a hardware failure.  Plan A was to recover and restore this server.  Plan B was to bring up a secondary server.  For Plan A, I sent the server to a local outfit, Sawtooth Technology, where they began the work of recovering and restoring the system.  I spent the day working on Plan B. 

 

There were a few things that made this process even more challenging than it already was.  The place where we were staying had no broadband connection.  I’ve become used to having broadband at just about every hotel I have stayed at in the last couple of years.  But here, there was no choice but to revert back to a 24k dial-up connection using my dusty-old, CompuServe account.  Furthermore, this area had no cell phone coverage, so I could not use the phone and work on-line simultaneously.  Apparently, the little cabin we were staying in, at Wallowa Lake, Oregon, is one of the few facilities there that even provides a phone line.

 

We did finally succeed with both plans, the original server was fully recovered with no loss of data, and a secondary server was brought online for the critical functions.  However, a day and a half of my vacation had been shot, Reid put in two 12 hour days and even missed a day of summer school, and the Sawtooth guys put in extra hours as well.  My wife also bore the burden of this stress and found it difficult to relax those days.

 

Ironically, we were in the process of decommissioning this particular server from critical use (it has given us problems in the past).  It was in one of the steps of that process where the crash occurred.  The services provided by this server were offline far longer than they should be in this day and age, but I am thankful that we got back up and running with all data recovered and even moved forward in the migration process as well.  My family and I did get to spend a few remaining days of our vacation doing fun things.  Later today we meet with the Sawtooth guys to discuss how to beef up our disaster recovery process.  I want to make that the last vacation disturbed by any server problems!

(BE8)

 
Comments:
An error has occurred. This application may no longer respond until reloaded. An unhandled exception has occurred. See browser dev tools for details. Reload 🗙