Many if not most organizations rely so heavily on their data centers that they must flawlessly operate 24×7. Organizations build in redundancy in an effort to avoid downtime, but various sorts of equipment failure can doom even the best laid plans.
An effective way to avoid such failures is to implement a preventive maintenance (PM) strategy. PM involves regular inspections of data center infrastructure, including power and cooling systems, to ensure they’re functioning as expected and to make any necessary routine adjustments or part replacements. It’s also a chance to check for the signs of wear and tear that portend future equipment failures.
From my work within the Schneider Electric Global Field Services group, I see three major approaches to PM among our customers that get progressively more advanced and effective.
At the most basic level are service plans that manufacturers offer with their data center infrastructure solutions. If you have a service plan on your cooling system, for example, once or twice a year a technician will come out to check on the system and make any required updates to ensure it’s running as it should. (Or, if something goes wrong, you call them, but that’s the kind of reactive maintenance that a PM plan seeks to avoid.)
A step up from that is a plan where you identify which data center infrastructure is most critical to your organization and make sure you routinely perform maintenance on those components. This means identifying the IT applications that are most critical to the business, then determining what data center infrastructure is crucial to keeping those applications up and running. Then you can implement a plan for preventive maintenance on those components. That entails following a schedule to replace consumable components, such as UPS batteries and capacitors that are typically not covered under your service plan but are critical to overall data center uptime. Should a capacitor fail, for example, it may or may not put the system into bypass, exposing downstream loads. A UPS with older batteries may not supply adequate run time for transfer to generator.
When creating a Preventive Maintenance plan, it’s important to get detailed information on the actual equipment you have installed and maintain it accordingly. Some UPS systems require capacitor replacement every 5 years, for example, while for others it’s a more comprehensive approach after 10 years. It’s also a good idea to stock essential spare parts for any critical infrastructure so you can more quickly repair it in the event of a failure. While the repair itself would be considered reactive maintenance, preparing to do it quickly is part of the PM plan.
Finally, as data center infrastructure becomes more technologically advanced, another type of PM is emerging called condition-based maintenance (CBM). Increasingly, the latest infrastructure is capable of reporting on its own component status. This information can be of great help for proactive notification of any abnormalities and in assisting the technicians that perform maintenance, helping them pinpoint any issues. Chief among them is if a certain component is beginning to wear and showing signs of distress, that can trigger an alert indicating it’s time to replace it.
The fact is components don’t always wear out according to a schedule. Factors such as temperature, overall environment, and IT load can dramatically shorten or lengthen the life of various systems. With CBM you’re replacing parts that actually need it – before they cause a failure that can threaten data center availability.
Keeping data center infrastructure in top operating condition is crucial to 24×7 availability of IT systems. Building a sound PM strategy that includes the approaches explained above can help ensure your infrastructure isn’t the cause of costly downtime. To learn more, check out the free APC by Schneider Electric white paper no. 124, “Preventive Maintenance Strategy for Data Centers.”