Friday, November 4, 2016

Welcome to IT

Lets say you work at a company that is a large small business (40-50 million revenue yearly, 100-200 people). Your IT department is a 1-3 man team, because "you're an expense" ...most business people think only sales people make them money. Don't worry that you can't make money if shit doesn't work, only sales makes you money.

Now lets pretend your last major upgrade to the servers was accomplished with a $75,000 budget. Getting that budget with the equipment you demanded was required was hard fought. Some corners were cut on "not absolutely necessary" things, things like a second slightly smaller and slightly slower server to run as a mirror of the first one, a server where you could do all your testing on. That "saved" the company $30,000, right? You just like to spend money, you never make the company any money.

Then, a year later you have something that absolutely has to be done to the server. You are pretty sure it will work, your outside support people are confident it will work, you have no server to test it on because all your other servers are much too small to handle it or are already tasked with other "critical" services. So you go with your best judgement and go live with a big change during the wee hours to cause the least interruption.

1 AM SHIT GOES BAD.

Now you're scrambling. By 5AM you're in a frantic attempt to get back online before major business starts, nothing you or your vendor have tried has worked, they've called in a half dozen of their T3's and developers all to no avail. People are rolling in, shit isn't working. Calls are happening. Pages are going out. 6AM, the owner rolls in. His shit isn't working. You're now thinking about reverting to last night's backup because the changes you were told would work without a hitch were nothing but a giant frozen boot in the nutsack hitch. People are getting really frantic about not being able to do business, nobody can order anything, nobody can sell anything, nobody can maintain inventory, nobody can do anything but sit around with their thumbs up their asses and surf the web. You're just an expense, you don't make the company money.

6:30AM, you make the decision to give up attempts at fixing and instead roll back to the last backup. You start the restore telling everyone "this should be resolved by 9:30AM everyone we have is on it and a full restore should take 2 or 3 hours tops."

9:35 rolls around, 9:40... 10:15 the backup fails at the last point. What the fuck? How the fuck? This is impossible! You make some calls, you explain that you have to attempt rolling back to the offsite backup, yes you understand that will lose the half the day's business and everything will have to be manually entered when the system is back up. You're given the "Well for Christ sake get it back up what do we pay you for!?!" (The go ahead. They have utmost confidence in your abilities.) You start the other restore. It works, but was much slower than the onsite one because fibre is only so fast. 3:00PM you're back online, things seem to be stable again.

3:30, nobody in IT has slept in 32 hours. You're called into a meeting with management. People want answers. You explain that you were assured everything would go smoothly by the vendor, you tell them that you were confident on your role in the upgrade as well. What should have been a 2 hour downtime during the night turned into a 17 hour ordeal. It was an unforeseeable incident. You mention that, "Had we had a working test environment to try this on first, we would have discovered the problem and avoided it."

Nobody wants to hear it. Everything is about reentering the previous day's sales, orders, receivables, inventory adjustments, etc. 4:30 the business day is basically a wipe. The downtime has cost the company a couple hundred thousand in lost business for the day. You're just another expense, you don't make the company any money.

Nobody learns from it other than yourself, a few other people in IT, and the vendor who "has never seen this problem before".

Your request for a new sandbox server is declined. Your request for a 2nd local backup server is seen as "another" frivolous idea.

You're just another expense, you don't make the company any money.

Welcome to IT.

No comments:

Post a Comment