AI workloads and direct-to-chip liquid cooling: The future of HPC

Artificial intelligence (AI) is driving transformative changes in just about every industry, enhancing everything from predictive analytics in finance to diagnostic tools in healthcare. These advancements rely on complex algorithms and large datasets used by Large Language Models (LLM) that significantly increase computational demands. As AI applications multiply, they are pushing the boundaries of HPC and necessitating advanced thermal management for LLM workloads and direct-to-chip infrastructure solutions.

Advanced thermal management for LLM

Challenges of heat generation in AI and HPC environments

AI and HPC environments generate far more heat than traditional computing, posing significant challenges for AI data centers. The average rack power density today is around 15 kW/rack, but AI and LLM workloads will drive the density of 60 to 120 kW/rack, according to industry forecasts.

Traditional air-cooling methods will fall short, especially with the high thermal loads produced by modern AI processors. Effective thermal management for LLM workloads is crucial for maintaining optimal performance and reliability. Overheating can cause equipment failures and reduce the lifespan of critical components.

direct-to-chip liquid cooling concept

Benefits of direct-to-chip liquid cooling solutions

How liquid cooling outperforms air cooling

Direct-to-chip infrastructure solutions that use liquid cooling to dissipate heat deliver several advantages over traditional air cooling. Chief among them: Liquid cooling can be up to 3,000 times more effective than air in capturing heat. Other benefits include:

  • Energy Efficiency: Liquid cooling uses significantly less energy compared to traditional data center cooling methods, enhancing overall energy efficiency.
  • Peak Performance: Systems can maintain optimal performance with the capability to scale out to significant wattage per processor.
  • Safety: Utilizing water-based and non-conductive fluids, cooling solutions offer safer environments for both personnel and equipment.
  • Environmental Impact: Innovative direct-to-chip solutions can help reduce carbon emissions, save energy, and reduce noise pollution. This creates a domino effect, reducing dependence on traditional air-cooling systems such as chillers, condensers, and CRACs/CRAHs.

By directly absorbing and dissipating heat from the hottest components, such as GPUs and AI chipsets, liquid cooling provides superior thermal management. It maintains AI processors at optimal operating temperatures, enhancing performance and reliability.

Schneider Electric cutting-edge thermal management solutions

Coolant Distribution Units (CDUs)

Motivair by Schneider Electric’s CDUs efficiently distribute coolant throughout the data center, supporting cooling capacities from 105kW to 2.3MW, essential for managing the thermal loads of LLM workloads and AI processors.

The ChilledDoor® Rear Door Heat Exchangers (RDHx)

This innovative product provides on-demand cooling by using liquid-cooled air to remove heat from high-wattage processors, with a cooling capacity up to 75kW, ensuring efficient temperature regulation.

Dynamic® Cold Plates

Designed for scalability, these components in direct-to-chip infrastructure solutions deliver exceptional performance. They can cool processors with thermal outputs up to +1,500 watts. Processor models include AMD, NVIDIA, Intel, or custom silicon.

In-Rack Manifolds

Custom stainless-steel manifolds used in direct-to-chip infrastructure solutions enable circulation between cold plate loops and CDUs, ensuring seamless integration and efficient coolant distribution within the data center.

Oil-free centrifugal chillers

As data center cooling requirements continue to rise, efficiency is becoming just as important as capacity. These free cooling, air-cooled, and water-cooled solutions help save energy and cut emissions, providing up to 2.5 MW cooling capacity and setpoints of up to 33°C (about 94°F) for air and liquid cooling applications.

Innovations and predictions for AI cooling solutions

As AI continues to evolve, demand for efficient thermal management will continue to grow. Schneider Electric, with Motivair, is committed to staying ahead of these emerging demands by innovating and improving direct-to-chip infrastructure solutions. Sustainable cooling solutions will play a critical role in the future of AI and HPC, ensuring these systems remain efficient and reliable. Experts predict that the integration of advanced cooling solutions will be pivotal in supporting the next generation of AI advancements.

Advancing with AI

Ensuring reliable AI performance with advanced thermal management

The pairing of AI and direct-to-chip infrastructure solutions represents a perfect synergy for high-performance computing. As AI applications grow and computational demands increase, advanced cooling solutions become crucial. Schneider Electric’s cutting-edge liquid cooling products provide the thermal management needed to keep LLM workloads and AI systems running smoothly and efficiently. For those looking to enhance the reliability and performance of their AI and HPC environments, Schneider Electric offers comprehensive solutions to address these challenges head-on.

Add a comment

All fields are required.