Liquid Cooling for AI Datacenters
Why is liquid cooling essential for AI datacenters?
H100 GPUs draw 700W+ per chip, making air cooling insufficient for high-density AI clusters. Liquid cooling enables: 80-100 kW/rack densities (vs 10-15 kW air-cooled), PUE of 1.05-1.15 (vs 1.3-1.5 air), and 30-40% reduction in cooling energy. Direct-to-chip is the mainstream solution; immersion offers benefits for extreme density but requires infrastructure changes.
Key Data Points
- Rack Density: 80-100 kW (Liquid) vs 10-15 kW (Air)
- PUE Efficiency: 1.05-1.15 vs 1.3-1.5
- Energy Reduction: 30-40% cooling energy savings
- Mainstream Tech: Direct-to-Chip (D2C) Cold Plates
- Emerging Tech: Two-phase Immersion Cooling
Why AI Datacenters Need Liquid Cooling
Heat Density
An 8-GPU DGX H100 produces 10.2 kW of heat. A single 42U rack can hold 4+ systems = 40+ kW/rack. Air cooling is limited to ~15 kW/rack.
Energy Efficiency
Liquid is 3,500x more efficient at heat transfer than air. This translates to PUE of 1.05-1.15 vs 1.3-1.5 for air, reducing total power consumption by 15-25%.
Density Economics
Higher density = less floor space, shorter cable runs, reduced facility costs. A liquid-cooled cluster may need 50% less space than air-cooled equivalent.
Liquid Cooling Technologies
Direct-to-Chip (D2C)
Cold plates attached directly to GPU/CPU, circulating liquid through rack-level CDUs.
Vendors: CoolIT, Asetek, Vertiv, Zutacore
Immersion Cooling
Entire servers submerged in dielectric fluid, removing 100% of heat.
Vendors: GRC, LiquidCool, Submer, Iceotope
Cooling Technology Comparison
| Factor | Air Cooling | Direct-to-Chip | Immersion |
|---|---|---|---|
| Max Rack Density | 10-15 kW | 60-100 kW | 100-250+ kW |
| PUE | 1.3-1.5 | 1.05-1.15 | 1.02-1.08 |
| Capex ($/kW) | $150-300 | $200-400 | $300-500 |
| Water Usage | High (evaporative) | Medium (closed loop) | Minimal |
| Serviceability | Easy | Moderate | More complex |
| Retrofit Difficulty | N/A (baseline) | Moderate | Major overhaul |
| Best For | Legacy, low-density | AI/HPC clusters | Max density, new builds |
Implementation Considerations
Direct-to-Chip Requirements
- •CDU placement: In-row or rear-door, ~1 per 2-4 racks
- •Water supply: 10-20 GPM per MW, chilled or facility water
- •Manifolds: Quick-connect at rack/server level
- •Leak detection: Sensors under racks, at manifolds
- •Air component: Still need some CRAC/CRAH for remaining heat
Immersion Requirements
- •Tank sizing: Custom tanks, typically 20-40 servers each
- •Fluid cost: $1-5/liter dielectric fluid (large volumes)
- •Floor loading: Much higher than traditional racks
- •Fire suppression: Different requirements vs air-cooled
- •Maintenance: Drip-dry procedures, specialized training
TCO Analysis: 10 MW AI Datacenter
Air Cooling
Direct-to-Chip
Immersion
Result: D2C breaks even vs air in ~9 months; immersion in ~16 months at $0.05/kWh. Faster payback at higher power prices.
Frequently Asked Questions
Can existing datacenters be retrofitted for liquid cooling?
Direct-to-chip retrofit is feasible with CDU additions and manifold installation. Immersion requires significant floor and structural changes. Many operators are adding liquid-ready infrastructure in new builds even if deploying air initially.
What about GPU warranty with liquid cooling?
NVIDIA supports liquid cooling on DGX and HGX systems. Third-party cold plates on individual GPUs may affect warranty—check with NVIDIA and the cold plate vendor. Most enterprise deployments use validated combinations.
How does liquid cooling affect colocation pricing?
Liquid-cooled colo commands 20-40% premium per kW due to infrastructure requirements. However, higher density means less space needed—total cost may be similar or lower. Emerging liquid-ready colos are more competitive.
What is rear-door heat exchanger (RDHX)?
RDHX is a passive liquid cooling solution that replaces the rear rack door with a heat exchanger. It can handle 20-30 kW/rack—between air and full D2C. Good for moderate density without server modifications.
Calculate Cooling Water Usage
Model water consumption for different cooling approaches and locations.
Open Water Usage Calculator →Infrastructure & Efficiency Guides
Explore More