On a sunny day in March, Susan, a data center manager for a medium-sized company, received an alert. There was a “door open” alarm of rack #43 in the corporate data center. She immediately called the data center network operations center (NOC) and asked if anyone had done a visual inspection, which is standard operating procedure. The on-site staff reported it was a false alarm, and everything appeared okay. Situation resolved, Susan asked if they were ready to perform the 10-server “swap out” planned for that evening. She was advised that the servers had arrived, and the in-house server tech was prepping them to be installed later in the evening.
This is an example of competent people all doing their jobs as outlined, successfully managing the data center to the required level of resiliency. It’s also the current industry norm as the Uptime Institute recommends multiple qualified staff members be on-site at all time to achieve the highest levels of Tier III or IV certification. However, the requirement for on-site staff is now being challenged due to recent events, prompting data center operators to work constructively and efficiently in a more remote, digitized way.
Impacts to the New Workplace
Most companies, even those deemed essential businesses, adopted strict work from home policies as the COVID 19 pandemic swept across the world. This caused a few issues to bubble to the top:
- Companies that relied on on-site data center support staff soon realized they had limited or no visibility into their data center operations because their staff monitors the data center on-site.
- Many companies that were proactive and had previously deployed remote capable DCIM (Data Center Infrastructure Management) quickly discovered gaps in their coverage.
- Cloud migration projects that were seen as “low priority” were moved to “high priority” or even “top of the list” priority.
As many companies were ill-prepared this time, they most assuredly will be prepared next time. This desire for readiness points to some very specific strategies that the majority of companies should implement:
- Outsource data center functions to cloud providers or colocation companies that are in the business of guaranteeing uptime of critical applications during normal times and even during a crisis.
- Make in-house data centers as “lights out” as possible by monitoring and performing maintenance and upgrade functions remotely or through automation.
Next Generation DCIM is Cloud-based and Leverages AI
The data center industry had already begun entering a new generation of data center management before the crisis. The next generation DCIM is cloud-based and leverages AI to provide predictive analytics, performance benchmarking, useful recommendations, and automatic links to service dispatch. Another feature is remote management from virtually any device – phone, tablet, or home PC. So, the quality of alerts and recommendations is much higher and the ability to receive these notifications is greatly enhanced.
Now we must look at ways to do server replacements without qualified on-site staff – the industry’s goal to create a “lights out,” unmanned data centers. At first glance, you may say it would be impossible. Again, the world is in a different place. We are in the Zoom, Teams, and Facetime era. And “lights out” and unmanned may not be entirely accurate as security staff will most likely be on-site. But what if part of the hiring process for security guards included basic mechanical capabilities and a willingness to do plug-and-play swap outs? Companies are already experimenting with “Zoom guided” maintenance and repairs.
Another idea is using robots to automate data center operations, and this has been the object of years of science experiments and proof of concepts. Some companies have demonstrated a robotic system that can replace servers housed in IT racks in data centers. Robots generally excel at doing a precise, repetitive task very quickly. Imagine a data center with servers stacked in one spot that need to be inserted in another spot. Once the robot learns the best trajectory, it can execute on this action millions of times, without mistakes or collisions. However, if this task is changed even slightly—for example increasing the server size or the position of the power or network connection—the robot will be unable to complete the task. In the future though, robots could be more adaptive and better suited to deal with small variances.
Managing the Data Center in a “Lights Out” Environment
The ongoing global health crisis has exposed the need for “lights out” data centers to maintain availability, as well as to ensure the safety of employees. Next generation DCIM and automation enable data centers to run with minimal or no qualified staff on-site during normal operations, emergencies or crises. For many years, I have been involved with discussions about “lights-out” data centers, but now we are quickly moving beyond mere talk into actuality.
Learn more about how technologies like our cloud-based data center management solution, EcoStruxure™ IT, can equip data centers to maintain operations in a “lights out” environment.