As I wrote in a recent piece published in Consulting-Specifying Engineer magazine, mission critical data centers are just that – critical to the business they serve. Simply put, that means they can’t suffer any shutdowns or even interruptions, including planned maintenance.
Achieving that kind of constant availability, of course, is lots easier said than done considering that humans run data centers – and humans have been known to make mistakes. There’s no way to completely eliminate human error but, as I argued in the CSE piece and posted in this Schneider Electric webcast, paying attention to two crucial elements – documentation and training – can dramatically reduce the frequency and impact of such errors.
I’ve touched on the documentation issue in a previous post but it bears repeating, especially given that many documentation programs don’t meet the stringent needs of a mission critical facility. It’s all about sustaining the appropriate level of documentation throughout the lifecycle of your data center. In short, the documentation has to explain, in explicit detail, how to perform all operations and maintenance activities. These activities range from the seemingly mundane standard operating procedures and administrative procedures to emergency operating procedures.
You may be wondering why it’s important to document procedures that your folks perform every day, or multiple times per day. Surely they know what they’re doing, right? Yes, they do, but without a detailed procedure to go by, staffers have a tendency to freelance, and do things in a way that they feel might be better or faster. Getting away from the tried and true is one sure way to invite human error to creep in.
What’s more, well-documented procedures will help in developing materials for your training program. The training program in a mission critical facility has to address not just new employees, but existing employees as well. A best practice approach is to implement a training program that aligns each site operating procedure to a specific level of certification, such as:
- Level 1: Basic knowledge and emergency response
- Level 2: Intermediate knowledge and frequent procedures
- Level 3: Advanced knowledge and infrequent procedures
- Level 4: Subject matter expertise on specific systems
The main reason many training programs prove ineffective is that organizations fail to budget for the amount of time and expense training requires. That’s shortsighted in my view. To the extent training can help employees avoid mistakes, the relatively small amount of money you spend on training can save you many times as much by increasing uptime. What’s more, proper training will help you lower maintenance costs because equipment will be well maintained as a matter of course. It also helps keep turnover down, because employees who consistently have access to training, and thus are advancing their careers, are generally happier than those who don’t.
Here’s one more important point: your training program is never “finished.” You can always inject new lessons learned from various sources, especially the direct experiences of your own data center workforce. If an engineer finds a particular procedure is no longer functioning as expected, perhaps the result of some new or upgraded piece of equipment, the procedure has to be changed to address the issue – which leads to new training.
To learn more about the topic, check out the full text of my CFE piece, “How To Effectively Operate Mission Critical Facilities.”