Proper maintenance is crucial to the reliable operation of any data center but too often companies approach maintenance in a piecemeal, haphazard fashion. They fail to realize that a standardized, repeatable maintenance strategy that encompasses all aspects of the data center is part of what operational excellence is all about, and leads to higher levels of data center uptime.
If you think about what goes into maintenance you can quickly see the issue. Schneider Electric estimates the average service activity requires 2.5 hours to schedule with an end user and coordinate with the site and technician. If you extrapolate that number across all the maintenance activities required to maintain your data center, it’s clear that managing a proper maintenance program can quickly become a monumental task.
But it’s also an extremely important one. As companies consider their critical infrastructure they need to consider the basis for the maintenance decisions they make and the outcomes they expect from the maintenance they perform. In theory the maintenance of equipment is designed to extend the life and reduce the chances of an outage based on systematic failure, but in practice poorly performed maintenance that lacks proper methodology may actually introduce risk and downtime to the equipment.
In many organizations we find that critical infrastructure portfolios are not necessarily maintained with the same standards, methods, scopes and processes. As organizations deal with the separation between IT and Facilities personnel we find that there is a large disconnect in how these two groups co-exist and support each other in accomplishing their missions. Responsibility for the infrastructure that supports critical data may lie with facilities while IT is responsible for the devices that are supported by this infrastructure. Ultimately, while both groups know their mission, they collectively aren’t always working together to accomplish it.
I offer three suggestions for how to fix the problem and reduce your risk of failure due to lax maintenance.
First, recognize that the leading cause of failure in critical environments is human intervention. The way to reduce such errors is to have humans follow detailed documentation for pretty much everything they do. That means having comprehensive documentation regarding methods of procedures (MOPs) and standard operating procedures. Such documentation makes clear the scope of any maintenance procedure and exactly how to perform it, such that it has no ill upstream or downstream effects. In short, proper maintenance documentation reduces risk.
Second, when it comes time to prioritize your operational budget, make sure maintenance gets its due. Remember that modest spending on remedial maintenance will not improve uptime but will prolong the life of your data center equipment and infrastructure. Make decisions based on empirical data, not just assumptions. Many maintenance activities today can provide predictive failure analysis based on data points collected during the maintenance process rather than expected lifecycles. Examples include generator oil, coolant and fuel sampling, battery trending and vibration testing. It’s like how some cars now tell you how much life is left in your oil on a percentage basis so you can change it when it’s really necessary, not just every 3,000 miles. When making data center purchase decisions, look for products that can deliver the kind of information that’ll help you make more informed decisions and stretch your maintenance budget.
A third tip is to consolidate and standardize your vendors as much as you can. That will make it easier to consistently apply best practices across the data center, which is crucial for operational excellence.
Visit our video collection for more on this subject