INFRASTRUCTURE

Liquid Cooling for AI Datacenters

Summary12 Data Sources

Why is liquid cooling essential for AI datacenters?

H100 GPUs draw 700W+ per chip, making air cooling insufficient for high-density AI clusters. Liquid cooling enables: 80-100 kW/rack densities (vs 10-15 kW air-cooled), PUE of 1.05-1.15 (vs 1.3-1.5 air), and 30-40% reduction in cooling energy. Direct-to-chip is the mainstream solution; immersion offers benefits for extreme density but requires infrastructure changes.

Key Data Points

  • Rack Density: 80-100 kW (Liquid) vs 10-15 kW (Air)
  • PUE Efficiency: 1.05-1.15 vs 1.3-1.5
  • Energy Reduction: 30-40% cooling energy savings
  • Mainstream Tech: Direct-to-Chip (D2C) Cold Plates
  • Emerging Tech: Two-phase Immersion Cooling

Why AI Datacenters Need Liquid Cooling

Heat Density

An 8-GPU DGX H100 produces 10.2 kW of heat. A single 42U rack can hold 4+ systems = 40+ kW/rack. Air cooling is limited to ~15 kW/rack.

Energy Efficiency

Liquid is 3,500x more efficient at heat transfer than air. This translates to PUE of 1.05-1.15 vs 1.3-1.5 for air, reducing total power consumption by 15-25%.

Density Economics

Higher density = less floor space, shorter cable runs, reduced facility costs. A liquid-cooled cluster may need 50% less space than air-cooled equivalent.

Liquid Cooling Technologies

Direct-to-Chip (D2C)

Cold plates attached directly to GPU/CPU, circulating liquid through rack-level CDUs.

Heat Removal60-80% of total
Rack Density60-100 kW/rack
PUE Impact1.05-1.15
Capex Premium15-25% over air
MaturityProduction-ready

Vendors: CoolIT, Asetek, Vertiv, Zutacore

Immersion Cooling

Entire servers submerged in dielectric fluid, removing 100% of heat.

Heat Removal100% of total
Rack Density100-250+ kW/tank
PUE Impact1.02-1.08
Capex Premium40-60% over air
MaturityEmerging

Vendors: GRC, LiquidCool, Submer, Iceotope

Cooling Technology Comparison

FactorAir CoolingDirect-to-ChipImmersion
Max Rack Density10-15 kW60-100 kW100-250+ kW
PUE1.3-1.51.05-1.151.02-1.08
Capex ($/kW)$150-300$200-400$300-500
Water UsageHigh (evaporative)Medium (closed loop)Minimal
ServiceabilityEasyModerateMore complex
Retrofit DifficultyN/A (baseline)ModerateMajor overhaul
Best ForLegacy, low-densityAI/HPC clustersMax density, new builds

Implementation Considerations

Direct-to-Chip Requirements

  • CDU placement: In-row or rear-door, ~1 per 2-4 racks
  • Water supply: 10-20 GPM per MW, chilled or facility water
  • Manifolds: Quick-connect at rack/server level
  • Leak detection: Sensors under racks, at manifolds
  • Air component: Still need some CRAC/CRAH for remaining heat

Immersion Requirements

  • Tank sizing: Custom tanks, typically 20-40 servers each
  • Fluid cost: $1-5/liter dielectric fluid (large volumes)
  • Floor loading: Much higher than traditional racks
  • Fire suppression: Different requirements vs air-cooled
  • Maintenance: Drip-dry procedures, specialized training

TCO Analysis: 10 MW AI Datacenter

Air Cooling

Capex$2.5M
PUE1.40
Annual Power (cooling)4.0 MW
Annual OpEx (@$0.05/kWh)$1.75M

Direct-to-Chip

Capex$3.5M
PUE1.10
Annual Power (cooling)1.0 MW
Annual OpEx$0.44M

Immersion

Capex$4.5M
PUE1.05
Annual Power (cooling)0.5 MW
Annual OpEx$0.22M

Result: D2C breaks even vs air in ~9 months; immersion in ~16 months at $0.05/kWh. Faster payback at higher power prices.

Frequently Asked Questions

Can existing datacenters be retrofitted for liquid cooling?

Direct-to-chip retrofit is feasible with CDU additions and manifold installation. Immersion requires significant floor and structural changes. Many operators are adding liquid-ready infrastructure in new builds even if deploying air initially.

What about GPU warranty with liquid cooling?

NVIDIA supports liquid cooling on DGX and HGX systems. Third-party cold plates on individual GPUs may affect warranty—check with NVIDIA and the cold plate vendor. Most enterprise deployments use validated combinations.

How does liquid cooling affect colocation pricing?

Liquid-cooled colo commands 20-40% premium per kW due to infrastructure requirements. However, higher density means less space needed—total cost may be similar or lower. Emerging liquid-ready colos are more competitive.

What is rear-door heat exchanger (RDHX)?

RDHX is a passive liquid cooling solution that replaces the rear rack door with a heat exchanger. It can handle 20-30 kW/rack—between air and full D2C. Good for moderate density without server modifications.

Calculate Cooling Water Usage

Model water consumption for different cooling approaches and locations.

Open Water Usage Calculator →

Infrastructure & Efficiency Guides

Explore More

Related Tools

PRO TOOL

Water Usage Calculator

Datacenter cooling water consumption analysis

Try Pro Tool
PRO TOOL

Colocation Pricing

Metro-level colocation cost analysis with market data

Try Pro Tool
PRO TOOL

BTM Power ROI

Compare Grid vs SMR vs Hydrogen power economics

Try Pro Tool