This audio was created using Microsoft Azure Speech Services
In Part 1 I explained the problem with the “percent leakage” term and I proposed more practical metrics. In Part 2, I got into the actual testing we did in our St. Louis cooling lab with the help of Scott Buell and his awesome team! Now in Part 3, I share closing remarks and best practices for your data center design that you can use based on the data and findings from our testing.
Best practices for data center containment
I can’t emphasize enough that our test setup is much different than the data center or data room you may have. However, based on these findings, I do think the following best practices (in priority order) are those you can replicate in your own data center:
1. Use blanking panels
The primary reason I’m listing blanking panels as #1 is because whenever 100% blanking panels were installed, we saw significant reduction in temperature variation, regardless of the containment configuration. The secondary reason is because blanking panels are a “cheap” fix with fast payback.
Here’s a quick financial analysis based on one of our tests:
100% blanking panels – With 100% blanking panels installed and no other containment (no doors or diffuser), we were able to reach an average IT supply temperature of 21.0°C/69.8°F with a standard deviation of 0.9°C/1.7°F, by increasing the CRAH fan to 1.32kW and chilled water valve to 62%.
78% blanking panels – In same exact test setup with only 78% panels, we increased the CRAH fan 100% (6.1kW) and chilled water valve to 100%. Yet still, we were only able to get the average IT supply temperature of 22.1°C/71.8°F with a standard deviation of 3.7°C/6.7°F. That’s an extra 4.78kW of power x 8760 hours/year x $0.10/kWhr or $4,187/year in energy cost.
It would have cost us $500 to $900 for the 186 blanking panels to fill the gaps in our 8-rack pod.
2. Diffuse the CRAH supply jets
It’s amazing how much jets influence the air temperatures at the rack fronts.
We had this pod completely sealed with tape and cardboard, yet we still measured temperature differences of 2.1°C/3.8°F between the maximum and minimum temperature reading. Why? Because our CRAH had a turning vane which sent powerful jets smashing into the end of the pod. These jets cause turbulent air vortices which create temperature differences. If possible, remove the CRAH turning vane, so that the jet smashes against the floor. Otherwise, try placing some type of barrier in front of the CRAH. This is an issue regardless of the distribution architecture, but I believe it’s a bigger issue for hard floor architectures.
We experimented with a “T” diffuser directly in front of the CRAH as well as with a “V” shaped diffuser. Based on the data, the “V” diffuser performed best in reducing standard deviation.
3. Duct hot aisle to cooling units
The main reason I chose this as a third priority, has more to do with the higher cost of materials and installation (relative to 1 and 2), than it does with our measured data. What I can say is that when we held the chilled water valve % and CRAH fan % constant, the duct alone (with missing blanking panels) appears to provide average IT supply temperatures closer to 70°F, but 100% blanking panels alone (no duct) appear to provide a lower standard deviation. The bottom line is that both practices work together as a system, and you should strive to implement both if your temperature tolerances warrant it.
Finding the optimal setting for data center design efficiency
After implementing the best practices in an existing data center, what would I do to keep my IT equipment cool and my energy bills low? Just like how we found the optimal chilled water position and CRAH fan airflow in our testing, I would do the same for a production data center. But this requires that you have temperature sensors on every rack. Maybe six per rack is too much, but if you don’t have any way of measuring the rack-front temperatures, you’re flying blind. In other words, you won’t know the impact your changes have, until you reach the point where your servers report high temperatures and potentially shut down. Bottom line is you need to invest in some sort of temperature monitoring.
1. Decrease chilled water valve setpoint
It’s likely a good assumption that the chilled water valve % opening settings were high prior to best practices. If this is true, then it’s likely that my average IT inlet temperature has significantly decreased after the implementing the best practices. So, I would begin closing the valve for each CRAH unit slowly, wait for steady state, and calculate the average IT temperature across the data center. Continue this until you get close to your target IT inlet temperature.
2. Decrease CRAH fan speed / airflow setpoint
Again, it’s likely that the CRAH airflow settings were high prior to best practices. If true, then it’s likely that my standard deviation has significantly decreased after the best practices. So again, I would slowly decrease the fan speed for each CRAH, wait for steady state, and calculate the standard deviation of the temperatures across the data center. Continue this until you get close to your target standard deviation.
What I’ve just gone through is easy enough for a small data center with only a few CRAHs, but with many CRAHs, it’s impractical to manually do what I’m suggesting. There are solutions in the marketplace that automatically change CRAH fan airflow to minimize energy while maintaining an IT temperature range. If you want to save more than just fan energy, you need to increase your chilled water supply temperature, but this can get complicated and is outside of the scope of this blog. This is likely one reason why Google invested in a machine-learning-based control system to minimize the energy consumption of their data center.
Closing remarks – get it right for your data center
Now let’s bring this home to the original point of this blog – how much temperature variation can your IT equipment tolerate or are you comfortable with? If you set your tolerance too narrow, how much CRAH energy are you willing to give up getting there?
Case 5 gave us a standard deviation of roughly 50% higher than the ideal “airtight” case of 0.4°C/0.8°F. However, if we increase the fan speed, we can decrease this standard deviation. How much are you willing to spend on extra fan energy to decrease deviation? Or, are you willing to bust out the gaff tape and cardboard? Based on our specific configuration and measured data, here’s a table that shows how much we needed to increase fan power (For Case 5), to bring the standard deviation closer to the ideal case of cardboard and gaff tape
The moral of the story is, if your equipment can tolerate a higher temperature variation, then perhaps it’s not worth spending the extra money on fan power to lower the standard deviation (or for that matter on doors, or gaskets).
Please leave a comment to let me know what you think about this blog or if you have any questions the Science Center can help answer for you.
Conversation
Muhammad Naveed Saeed
5 years ago
I must say excellent effort to bring forth underneath causes and possible remedies of temperature variations in front of rack, especially ” the CRAH supply jets” that is usually not considered in evaluation. A simple way to calculate ROI of blanking panels must be helpful for Data Center operations team.
Apparently, the experimentation and research is based on static/constant load conditions in Racks. It would be interesting to know results of dynamic load conditions (simulating actual IT load). Another major cause of hot air short cycling is vertical gap inside racks, especially for network racks (750mm/800mm). An average quality cabling management at rear of the rack easily push 10-15% hot air in cold aisle. That can cause serious temperature variations in front of that rack and adjuscent racks. Vertical blanking panels can be a possible solution.