How many of you software engineers have had this happen to you in your career? You’ve been pounding away on a hot keyboard all day long, writing code, sending out e-mail status reports, attending endless back to back meetings and all the other of myriads of tasks demanding a software developer’s time.
The clock finally reads 5 pm, and you’ve been able to survive yet another day in the battle for code monkey supremacy in the IT world.
Your thoughts start drifting towards what you’re going to eat for supper, the movie you plan on watching with the family, and your nice long weekend plans when all of a sudden, you get an urgent text message from your boss that one of the critical applications your business team is responsible for, has suddenly gone haywire.
It’s one of those applications that provide major revenue for the company and helps to justify and pay for the existence of software programmer staff like yourself. Any major downtime of that application means lost revenue and potentially losing credibility and the respect of your customer. And if there’s one thing your company can’t afford, it’s to lose customers and revenue.
Your boss suddenly declares an emergency all hands on deck call for all team members to stay at work until the emergency crisis is completely resolved.
This means all vacation requests that occur during the time of this new emergency are on hold, and all software engineers are expected to come into work early in the morning and stay as late in the evening as possible.
Now with the advent of remote work software like WebEx, it’s very easy to continue attending online team meetings either on your computer or smartphone device.
I doubt there is a single software developer on the planet who hasn’t experienced this kind of emergency situation in their career.
But for those developers who haven’t yet experienced an IT crisis, it WILL happen, sooner or later in your professional career.
My First IT Crisis
I remember the first time I personally experienced an IT crisis. I was working for a platform as a service company in the online banking financial industry at the time.
One of our banking clients wanted to integrate into our core banking software platform and I was the lead software engineer tasked to complete this integration between the bank and their core platform and our own.
At first, everything went full steam ahead. I knew what I had to accomplish and how I was going to get there.
But there’s this famous saying in military strategy that “no battle plan survives contact with the enemy”.
And this is what happened with my high hopes that this project was going to be a slam dunk project. This was early on in my professional software development career, and I was confident that getting this project done was going to help propel my career.
That is until I kept hitting this wall where I just couldn’t figure out how to successfully integrate our core platform into our customer’s system.
This is when my manager and the executives began getting concerned about the slow progress of the project. It progressively got worse as I hit this same wall of failure day after day.
And the pressure slowly but surely ratcheted up. I was expected to work nonstop solely on this project.from early in the morning to late in the evening. I was also expected to provide numerous status updates to my manager and upper management regarding my progress.
It was my first high pressure cooker experience, and believe me when I say it was very unpleasant and highly stressful. I was starting to lose sleep and agonize over my inability to finish the project. When you face failure upon failure, no matter how hard you try, it’s very easy to get discouraged.
At the time, I didn’t realize why so much pressure was brought to bear on myself and my immediate team. It’s important to realize just how dependent an organization is on its software products and services if that is the main revenue source for a company.
Imagine what would happen if a company’s core “crown jewel” software product was under threat of catastrophic failure.
The Enterprise IT Emergency
This is exactly what happened to Microsoft during their Windows XP operating system days… it was plagued with security holes and wide open exploits that allowed devious hackers to take control of your Windows machine.
The problem escalated so severely, the major press networks got a hold of the problem. It was no surprise Microsoft put out an all hands on deck war room situation… they full well knew if they ignored or downplayed the problem, it would result in potentially massive lost sales and revenue and giving other competitors a potential opportunity to steal away lots of customers.
It’s why the next version of Windows, Vista, had beefed up security all throughout the operating system… security was the utmost priority due to all the bad PR they experienced from all the security exploits and attack vectors Windows XP offered to hackers.
And you can bet dollars to donuts Bill Gates and the top executive staff put as much pressure and priority as humanly possible on everyone reporting to them to fix this issue ASAP.
In these kinds of all hands on deck scenarios, there is simply no way to resolve the issue without everyone rolling up their sleeves and putting all their focus and effort on resolving the problem.
The problem occurs when this high pressure is brought to bear on the engineering staff that has been tasked to fix the problem.
When all eyes are on you to get something fixed, it’s easy to make mistakes and rash decisions in an attempt to fix the problem.
This is why it’s so crucial for software engineers to remain calm, above all else, even in high-pressure emergency situations.
When you allow emotions and panic to seep into your thoughts, it’s only too easy to get into the dangerous mode of trying anything to fix the problem, essentially throwing spaghetti against the wall to see if anything sticks.
This kind of trial and error troubleshooting is the worst way to work on a problem. It wastes time and resources and ends up having everyone involved in a desperate wild goose chase.
Best software practices are best software practices even in emergency situations. A software engineer must continue to apply all the best practices and principles of proper software design and architecture. He must continue to remember the importance of test-driven development and using hard evidence and test results to prove or disprove potential solutions to the problem.
That said, it is proper for a software engineer to consider quick & dirty “duct tape” solutions to fix a particular problem.
If a duct tape solution is the only feasible way to fix an immediate problem and the only other alternative is the risk of losing a customer or major revenue, a software engineer needs to clearly convey this to their management.
Duct tape solutions, by their very nature, are very different from PROPER long-term solutions. They are temporary band-aids applied to a problem to buy the company a little more time to come up with a more long-term solution.
The Critical Skill You Need to Survive an IT Crisis
It’s in situations like these where communication skills are of the utmost importance. Software engineers absolutely need the skill to clearly communicate to both technical and non-technical staff about the problem at hand.
Management and executives live and die by successful communication skills. They are all intently watching the engineering staff involved in resolving the emergency situation at hand.
The better software engineers are at communication, the more it helps to reduce the overall high pressure and stress factors. Even when there is no immediate solution at hand, the mere fact that there is lots of communication and transparency involved will be appreciated. The one thing that makes managers and executives go crazy with nervousness is when they don’t know what the current status of the problem is.
Learning how to write clear communications as a software engineer is just as important, perhaps even more, than learning how to code software. At the end of the day, software is just a means to the end, which is to help your organization either save or make money.
The Crisis Doesn’t Stop with the Stop-Gap
This doesn’t mean software engineers should accept the temporary band-aid solution as the long-term solution.
The band-aid solution is just that…. a TEMPORARY stop-gap solution until a longer-term solution is architected and applied.
If a longer term and more proper solution isn’t devised and put into place, there is a very high likelihood the stop-gap solution will either create more problems than it solves over the long run, or will simply stop working at some point, and we’re all back at square one again.
The other important factor to consider for the software engineer directly involved in solving the problem is the matter of overtime.
In all hands on deck emergency situations, there is an unwritten, unspoken understanding that technical staff acknowledges that working on the problem will most likely involve lots of additional time beyond the standard eight-hour workday, to solve the problem.
When a town is in danger of flooding, you will often see workers and civilians work late into the wee hours of the morning, furiously putting up sandbags and barricades to prevent the town from getting overrun with floodwaters. Without the understanding that lots of additional overtime and effort is involved in emergency situations like that, the town is in real danger of ending up underwater with lots of property damage and potentially, human lives at stake.
Software engineers need to mentally prepare for this expectation. I doubt there is a single organization on the planet who would disagree with this philosophy.
That being said, there’s a point where working so many overtime hours becomes counterproductive. Without proper rest, a software engineer will begin to make critical mistakes which will inevitably reduce the chances for overall success.
When a software engineer feels like they are that point in time, they must not be afraid to relay these concerns to their management and executive staff. Getting so tired and stressed out by working on a problem without any sort of rest won’t help anyone.
It’s also important for software engineers to admit when they need additional help. Organizations need to recognize the importance of the “many eyes” theory of problem-solving.
There’s a saying that every big problem starts looking small when you have multiple eyes and people thinking and working on the problem together.
During the early part of my professional software development career, I was too proud to admit I needed help when I hit a brick wall when working on a bug or particular piece of functionality in a software application.
I’d let a problem fester while I struggled to figure out a problem all by myself.
It wasn’t until later in my career when I realized the absolute importance of admitting when you needed help.
Everyone has different skillsets and knowledge. If you have a weakness in a particular technical area, you can bank on the fact that there will always be some other software engineer with expertise in that same area of knowledge.
Admitting you need help and guidance is not a sign of weakness… as a matter of fact, it’s a sign that a person is more concerned about fixing a problem than with their own concern about saving face. It will speak volumes to others about how you work.
Learning how to deal with crisis may not be fun, but it is absolutely essential to the success of a software engineer’s career. Soon or later, every software engineer will encounter their own flavor of some technical crisis.
Every software engineer must continue to hone their technical skills, non-technical skills, and communication skills in order to survive the fiery trials and tribulations of an IT crisis.
“Once more unto the breach, dear friend, once more!
But when the blast of war blows in our ears, Then imitate the action of the tiger;
Stiffen the sinews, summon up the blood,
Disguise fair nature with hard-favour’d rage;
Then lend the eye a terrible aspect;” – Henry V, William Shakespeare