An industry’s journey to system-level predictive analytics for the data center starts now!

This audio was created using Microsoft Azure Speech Services

The data center industry has a defined list of challenges that have been well-known and well-publicized for years:

  • Increased digitization fueling increased data center demand
  • Reliable maintenance and cost-effective operations within the current context of growing sustainability and climate change concerns
  • Modernization of legacy facilities to keep pace with technology evolution and energy efficiency requirements
  • Ongoing workforce skills and talent shortage

Add to this, the much-hyped developments around artificial intelligence (AI) and its implications, including further demand for data center capacity. AI requires performance-intensive computing (servers, storage, and networking) which requires more energy and produces more heat. Combining all these challenges (or opportunities) creates a more pressing call to action for the data center industry to collaborate to develop pragmatic and sustainable solutions.

Today’s fast evolving market dynamics for the data center industry

The market dynamics for the data center industry are evolving at a previously unimaginable pace – quite frankly, the data center market is exploding and is expected to continue this accelerated growth for the next few years. “IDC projects global service provider power capacity to grow from 93,908MW to 216,195MW in 2027, representing a compound annual growth rate (CAGR) of 18.4%.” This explosive demand is not only due to consumer-driven activities like content streaming. Instead, it is also a direct response to the growth of AI and the processing power needed to support AI software stacks and the corresponding outputs (i.e., models, images, text, etc.).

Unlike traditional IT loads, which normally require data to be closer in proximity to the point of data generation or consumption, data centers supporting AI model training do not have that limitation. Because latency is not a concern with these loads, data centers primarily supporting this type of activity can be deployed in more remote locations. There is a tradeoff here with data center operators’ ability to build in markets with lower demand and energy costs and without the resource constraints we typically see in today’s primary markets such as Northern Virginia in the U.S., Paris, Amsterdam, etc. However, this flexibility in build location will create more geographically distributed data center networks.

Increasing data center capacity with new builds is only a fraction of the equation. New builds supporting AI software stacks will also need to be designed to support increased power consumption and heat removal. But what is the plan for legacy facilities? The lifespan of a data center is typically 15-20 years vs. the lifespan of IT equipment (3–5-year refresh cycles) vs. data center physical infrastructure (5–10-year refresh cycles). As data center infrastructure ages, reliability and cost performance must be managed effectively to maintain competitiveness with newer footprints. Aside from technological advancements of physical infrastructure in new builds contributing to improved performance, reliability, energy efficiency, and use of space, operators of newer facilities are beginning to reconsider their approach to operations and maintenance. A shift in mindset is underway which recognizes that reactive preventative management – a combination of calendar-based service interventions and real-time monitoring through a building management system (BMS) or a dedicated monitoring platform for the data center infrastructure, can be costly and risky. Additionally, disparate OEM standards and data sets means that it can be difficult to build comprehensive data analytics as access to and the quality of the data vary.

Today’s false market assumptions

The ability to develop solutions for the challenges of the data center industry (both old and new) are being hampered by the prevalence of today’s false market assumptions. The most consequential being:

  • A single company can build predictive analytics and condition-based maintenance on their own. A solitary approach did not work for improving the energy efficiency of data centers, and it will not work for building predictive analytics and system-level condition-based maintenance. The solution to the data center energy efficiency conundrum was instead developed by members of the Green Grid, comprised of data center industry leaders, who introduced the Power Usage Effectiveness (PUE) metric which became “the industry-preferred metric for measuring infrastructure energy efficiency for data centers.” The same industry-wide collaborative approach will be needed to build system-level predictive analytics and system-level condition-based maintenance.
  • Predictive analytics for the data center already exists. There is a common misconception that predictive analytics already exists and are widely adopted in the data center industry. This is most certainly a false assumption. What exists today are asset-level or component-level algorithms enabling condition-based maintenance. Today organizations can leverage data, based on asset-rules and equipment setting thresholds, as a guide to intervene before an equipment failure occurs at the asset level. The ability to view the data center at system level, meaning analytics across the entire system, and pre-determine when an equipment failure will occur and what other areas of the system will be impacted as a result, has not yet been actualized…although there are some industry collaborations underway today which will bring predictive analytics closer to reality. Any single organization promising predictive analytics, especially at a system level today, is riding the industry marketing wave and further building the hype.
  • Connectivity causes cyber threats…data centers will never connect. Ensuring cyber security in the data center is paramount. However, there are ways for data centers to connect to a cloud environment without compromising on their cyber security protocols. Quite frankly, connectivity is a step all data center operators will need to embrace to reap the benefits of AI. This is not intended to be a flippant statement. Once a data center operator and their partner understand what data needs to be shared, there are cyber security best practices we would recommend implementing as part of any connectivity process including: ensuring SOC2/ISO 27001 compliance, leveraging a one-way data push with data diode if required, and executing an external penetration test, to name a few. Access to the data is essential to enable AI modeling for later inference.
  • AI vs. Humans. Although it makes for good click-bait, we fundamentally do not support the argument that AI will replace humans. This is not a binary situation, instead it is both/and. We will be able to drive greater efficiency and impact by marrying the power of AI with domain expertise to deliver the right outcomes. AI models still require human validation and context, and we are not yet at a juncture where we can take AI models at face value.

A fresh approach to solving old problems

We need to embrace a new approach to finally solve the problems that have been plaguing the data center industry for years. Liberating ourselves from these challenges will enable us to focus on our next chapter – driven by AI. So how do we transform that vision into reality?

It begins with a shift in mindset and behavior; a willingness to engage with innovative solutions in development today that challenge industry norms. Taking a system-level approach is the only way to reach fully predictive analytics and by using AI, we can develop system-wide models. By bringing full circle the power of digital + human, these AI models will need to be built using domain expertise. An intimate level of subject matter expertise is required to initially build the right models aligned to the business need. Over time as more data is ingested into the models, the models will become smarter and gain the ability to learn on their own. To achieve this vision, domain ingestion at scale is required, including past data and insights.

To start, we must understand that a data center is not a collection of assets. It is a compilation of systems and subsystems containing assets. In a redundant configuration, an asset can fail, and a system can still operate. However, a failure in the system is rarely an isolated event. Understanding the interaction between the assets in the system unlocks the key data necessary to help deliver predictive capabilities at the system level. Deriving the required actions from this key data requires the extraction of this data via secure connectivity to build the associated AI models.

These models will need to be validated by what is called Ground Truth Validation:

By leveraging the data-driven AI models, we will be enabled to offer prognostic insights, which are recommendations around what is wrong in a system based on a data-driven model vs a data-guided approach using historical data. These prognostic insights can then be shared with a service representative in advance of an on-site intervention, enabling them to arrive to a data center site with the right equipment and parts to resolve the issue within a fewer number of visits leading to a greater reduction in planned downtime for the site. The Service Representative would then be able to confirm whether the prognostic insights were indeed accurate which would be fed back into the model for continuous improvement and ongoing training of the models. Through this process, on-site staff would be enabled to do more with less.

The combination of AI and domain expertise would enable the industry to transition to a truly system level predictive maintenance approach confidently and reliably. We believe we can now reduce risk to the point that the value of a predictive maintenance approach exceeds the risk of a transition from a time-based maintenance approach, which until now has been a struggle. An additional benefit of a truly predictive maintenance approach is the capability to optimally service both systems and assets with systems by logically grouping equipment maintenance together guided by the abundance of data available.

Through our partnership with Compass Datacenters, Schneider has been able to develop new tools and service models which allow us to truly transition to predictive maintenance on key data center assets at the system level. Together we had a vision of this transformation of how data centers are operated and maintained, which today has become a reality. By combining data and a statistical approach to risk, we have proved that we can reduce truck rolls, optimize interventions, and create a path forward for data center operators ready to embark on this transition to system level condition-based maintenance. We have successfully introduced an industrialized asset framework which allows for multi-variable and cross-system data analytics to determine correlation and / or causality of anomalies. With real-time high frequency data streams, power quality information is being generated with the ability to indicate faults originating outside of the Compass modules. We have been able to successfully leverage ambient module and localized asset temperature data coupled with power data to create a thermodynamic model for the overall module cooling health. This allows for the operating mode, weather, and dynamic IT load to be modelled, benchmarked, and optimized.

Key lessons learned on this journey

Although we are still on our journey with Compass and our evolution to predictive system-level analytics, there are some key lessons we have learned thus far which we believe if adopted by the broader industry would help to accelerate defining solutions to our shared challenges:

  • Change is a mindset. As industry professionals, we know that data center uptime is paramount. However, we need to balance this need with a more open mindset to change – from both a technical and financial perspective. Our industry sits at the crux of digital evolution and to keep pace, both manufacturers and data center operators must be willing to embrace change for continuous innovation.
  • Have a plan but be flexible. Defining a vision to guide you towards your organization’s ambition is crucial. However, understand that you are not working towards a “one and done” solution. Instead, it is important to understand this is an evolution and the value to be gained is through the iterative continuous journey.
  • Domain expertise comes from many places. Domain expertise is the value that people bring to the AI evolution and this expertise has a variety of sources. It is important to work across industries and all manufacturers in the value chain to gain domain expertise across the energy train.
  • There is no one-size-fits-all solution. The belief that there is a single solution will hamper your progress. It is important to align with the customer’s environment to realize value. Manufacturers can provide a “base design” or “building blocks” so to say, but the true value will be the modular add-ons aligned to the customer’s environment and business objectives.
  • Investing in AI is broader than equipment. The potential value of AI is greater than the software stacks and physical infrastructure. As the industry is making hardware and software investments, it is important to consider the people investment as well. We have learned that there is a need for a dedicated AI team focused on the ingestion of data at scale to develop AI training models. Further, a process needs to be in place for this AI team to closely collaborate with the domain experts providing the context.

Where we go from here

This is an exciting time for the data center industry and our next steps will have a significant impact on the future of our industry. Like how the industry united to find a solution to data center energy efficiency through PUE, the industry will need to once again join forces to solve our existing challenges as well as those posed by AI. At Schneider, we know that the journey to system-level predictive analytics and condition-based maintenance will occur over time but there is value to be realized today. We look forward to the journey ahead with other industry leaders and invite you to stay connected and join us through our innovation experience.

Tags: , , , ,

Add a comment

All fields are required.