Liquid Cooling & Thermal Design for AI Datacenters

Summary12 Data Sources

What are the cooling options for AI datacenters?

AI datacenter cooling has evolved from traditional air cooling (PUE 1.4-1.6) to direct-to-chip liquid cooling (PUE 1.1-1.2) and full immersion cooling (PUE 1.03-1.05). For H100/H200 deployments at 700W+ per GPU, liquid cooling is becoming the industry standard.

Key Data Points

  • Air Cooling: Limited to 15-25kW per rack; PUE 1.4-1.6
  • Direct Liquid Cooling (DLC): Supports 60-100kW per rack; PUE 1.1-1.2
  • Immersion Cooling: Supports 100kW+ per tank; PUE 1.03-1.05
  • OpEx: Liquid cooling reduces annual cooling power costs by 30-40%

Liquid cooling is no longer optional for high-density GPU deployments. Understand the economics, thermal design trade-offs, and financial impact of air-cooled, liquid-cooled, and immersion-cooled infrastructure for AI workloads.

Rack Density

60-100kW

Per rack (liquid)

PUE Improvement

1.15-1.25

vs 1.4-1.6 air

CapEx Premium

15-30%

For liquid retrofit

OpEx Savings

20-40%

Cooling costs

Cooling Challenges for AI Loads

Modern AI GPUs like the H100 and upcoming B100 generate 700W-1000W per chip, pushing rack densities to 60-100kW compared to traditional 5-15kW enterprise racks. Air cooling becomes thermally and economically impractical at these densities, forcing infrastructure operators to adopt liquid cooling solutions.

The financial implications extend beyond cooling equipment costs. Higher densities enable better space utilization (more compute per sq ft), lower PUE reduces power consumption (critical for ERCOT and other grid-constrained markets), and improved thermal management extends GPU lifespan and reduces throttling-related performance degradation.

For greenfield builds, designing for liquid from the start reduces costs. For retrofits of existing air-cooled facilities, the economics depend on power availability, remaining useful life, and competitive positioning. Both scenarios require careful financial modeling to optimize TCO.

Liquid vs Air vs Immersion Cooling

Air Cooling

Max Rack Density

15-25kW

PUE Range

1.4-1.6

CapEx

Baseline

Pros

  • • Lowest upfront cost
  • • Mature technology
  • • Easy maintenance

Cons

  • • Limited GPU density
  • • Higher OpEx at scale
  • • Not viable for H100+

Direct Liquid Cooling

Max Rack Density

60-100kW

PUE Range

1.15-1.25

CapEx

+15-30%

Pros

  • • Optimal for H100/B100
  • • 20-40% cooling savings
  • • Better space efficiency
  • • Industry standard emerging

Cons

  • • Higher initial investment
  • • Specialized maintenance
  • • Leak risk management

Immersion Cooling

Max Rack Density

100-250kW

PUE Range

1.05-1.15

CapEx

+40-60%

Pros

  • • Highest density possible
  • • Best PUE achievable
  • • Future-proof for next-gen

Cons

  • • Highest upfront cost
  • • Complex operations
  • • Limited vendor ecosystem
  • • Hardware compatibility

Impact on Financial Models

Example: 1MW AI datacenter deployment (120 racks @ ~8kW average GPU load per rack)

MetricAir CoolingLiquid CoolingDelta
Initial CapEx$15M$18.5M+$3.5M
Annual Power Cost (cooling)$420K$250K-$170K/yr
Usable Rack Density15kW80kW5.3x
PUE1.501.20-20%
5-Year TCO$17.1M$19.8M-
TCO per kW (GPU)$1,425$206-86%

Key Takeaway

While liquid cooling adds 15-30% CapEx, the ability to achieve 5x+ rack density means dramatically lower cost per kW of GPU compute. For high-utilization AI workloads, liquid cooling typically achieves payback within 18-24 months through OpEx savings and space efficiency.

Thermal Design & Efficiency Strategy

Explore More

Thermal Design & Efficiency Tools

PRO TOOL

Colocation Pricing

Metro-level colocation cost analysis with market data

Try Pro Tool
PRO TOOL

Water Usage Calculator

Datacenter cooling water consumption analysis

Try Pro Tool
PRO TOOL

BTM Power ROI

Compare Grid vs SMR vs Hydrogen power economics

Try Pro Tool
PRO TOOL

Golden Zones Finder

Identify optimal datacenter locations based on power, fiber, incentives

Try Pro Tool