The IT Operations Hangover


Hey there's a tiger in the bathroom! Having all kinds of unexpected surprises in your data center? Maybe the quantity and speed of changes are getting out of hand, and you are feeling like you are having an IT Operations hangover.

What do I mean?

Remember the movie Hangover? 4 guys go to Las Vegas to celebrate that their buddy is getting married, and they get carried away with some wild partying. Jump to the next day and 3 of them wake up in a Vegas suite with a tiger in the bathroom, a crying baby, a missing tooth, Mike Tyson ready to kick their ass, and their buddy - the groom - is gone.

They figure that if they can piece together what happened the night before, then the can find their friend and get him to his wedding. Of course as they start to look into this hangover-induced twilight zone, they just uncover more and more things that happened that they weren't aware of, further complicating their predicament.

This is a lot like what it can feel like is happening in IT Operations.

No, system admins aren't getting dead drunk and blacking out about what happens in their IT environments, but they can arrive at work in the morning and see that their smoothly functioning infrastructure is now in a complete mess. Of course, they can only say 'what happened?' Just like in the movie, ok not quite like in the movie, but in the same spirit. IT ends up having to spend a lot of time investigating and discovering all the changes that occurred to the infrastructure and then trying to figure out what changes had an impact on operations.

Modern Data Center Complexity and Dynamics

With the maturity of the modern data center and the advent of virtualization, these changes aren't just happening through the night, but throughout the day. The IT landscape has grown in complexity supporting a wider and growing range of technologies and platforms (Virtualization, Cloud, Open Source etc.), and accelerated application deployment and software deployment schedules. A typical environment includes thousands of different configuration parameters. Its not just an issue of having an environment that worked yesterday, but now when something changes the stability of the environment can be impacted throughout the day. 

Performance Slips and Where to Start Looking

That hazy, groggy feeling of a hangover with little recollection of what happened earlier is kind of how you can feel trying to keep the IT infrastructure under your control and at peak performance while looking into the threatening changes that can slip in. The difference from the movie is that, in the movie the guys followed a linear trail of clues leading from one crazy situation to the next, while in the data center you don't have a clue. When performance slows or an incident occurs you can be confronted with hundreds and even thousands of configuration parameters, and then have to sift through them to see what had an impact and where. This would mean engaging in a tiresome process of sifting through this information, something deemed nearly impossible considering the time restrictions involved and the limited resources for reviewing the information (some say it would take an IT team the size of Cleveland to regularly go over all the configuration parameters in the system). 

Time Is Running Out

Well you don't have a friend about to get married, but you don't have time for this tiresome checking, with the stability and availability of the your IT environment on the line, the business is at risk. So you need to get to bottom of the matter, and fast. Moreover, you need to be able to 'see around the corner' and stay aware of drift in your environments. So if this were a hangover, we'd be talking about some pretty strong coffee. You need to be able to efficiently crawl the vast amounts of changing configuration data (without employing the entire city of Cleveland), and analyze the data in real-time for criticality so that your IT team can stay informed and aware of what is happening and just what is the likely cause of a problem. It's not enough to just put a stop to the problem, but you need to understand what are the causes and deal with them with accurate knowledge about where changes occurred and know their impact. If you thought it would end after one bad night, then you are mistaken.

Just look at the movie Hangover, there is already a sequel out, and once again, the guys are in big trouble and are trying to figure out  what happened to them (or really what changed) last night...again.

