Liquid Cooling & Thermal Design for AI Datacenters
What are the cooling options for AI datacenters?
AI datacenter cooling has evolved from traditional air cooling (PUE 1.4-1.6) to direct-to-chip liquid cooling (PUE 1.1-1.2) and full immersion cooling (PUE 1.03-1.05). For H100/H200 deployments at 700W+ per GPU, liquid cooling is becoming the industry standard.
Key Data Points
- Air Cooling: Limited to 15-25kW per rack; PUE 1.4-1.6
- Direct Liquid Cooling (DLC): Supports 60-100kW per rack; PUE 1.1-1.2
- Immersion Cooling: Supports 100kW+ per tank; PUE 1.03-1.05
- OpEx: Liquid cooling reduces annual cooling power costs by 30-40%
Liquid cooling is no longer optional for high-density GPU deployments. Understand the economics, thermal design trade-offs, and financial impact of air-cooled, liquid-cooled, and immersion-cooled infrastructure for AI workloads.
Rack Density
60-100kW
Per rack (liquid)
PUE Improvement
1.15-1.25
vs 1.4-1.6 air
CapEx Premium
15-30%
For liquid retrofit
OpEx Savings
20-40%
Cooling costs
Cooling Challenges for AI Loads
Modern AI GPUs like the H100 and upcoming B100 generate 700W-1000W per chip, pushing rack densities to 60-100kW compared to traditional 5-15kW enterprise racks. Air cooling becomes thermally and economically impractical at these densities, forcing infrastructure operators to adopt liquid cooling solutions.
The financial implications extend beyond cooling equipment costs. Higher densities enable better space utilization (more compute per sq ft), lower PUE reduces power consumption (critical for ERCOT and other grid-constrained markets), and improved thermal management extends GPU lifespan and reduces throttling-related performance degradation.
For greenfield builds, designing for liquid from the start reduces costs. For retrofits of existing air-cooled facilities, the economics depend on power availability, remaining useful life, and competitive positioning. Both scenarios require careful financial modeling to optimize TCO.
Liquid vs Air vs Immersion Cooling
Air Cooling
Max Rack Density
15-25kW
PUE Range
1.4-1.6
CapEx
Baseline
Pros
- • Lowest upfront cost
- • Mature technology
- • Easy maintenance
Cons
- • Limited GPU density
- • Higher OpEx at scale
- • Not viable for H100+
Direct Liquid Cooling
Max Rack Density
60-100kW
PUE Range
1.15-1.25
CapEx
+15-30%
Pros
- • Optimal for H100/B100
- • 20-40% cooling savings
- • Better space efficiency
- • Industry standard emerging
Cons
- • Higher initial investment
- • Specialized maintenance
- • Leak risk management
Immersion Cooling
Max Rack Density
100-250kW
PUE Range
1.05-1.15
CapEx
+40-60%
Pros
- • Highest density possible
- • Best PUE achievable
- • Future-proof for next-gen
Cons
- • Highest upfront cost
- • Complex operations
- • Limited vendor ecosystem
- • Hardware compatibility
Impact on Financial Models
Example: 1MW AI datacenter deployment (120 racks @ ~8kW average GPU load per rack)
| Metric | Air Cooling | Liquid Cooling | Delta |
|---|---|---|---|
| Initial CapEx | $15M | $18.5M | +$3.5M |
| Annual Power Cost (cooling) | $420K | $250K | -$170K/yr |
| Usable Rack Density | 15kW | 80kW | 5.3x |
| PUE | 1.50 | 1.20 | -20% |
| 5-Year TCO | $17.1M | $19.8M | - |
| TCO per kW (GPU) | $1,425 | $206 | -86% |
Key Takeaway
While liquid cooling adds 15-30% CapEx, the ability to achieve 5x+ rack density means dramatically lower cost per kW of GPU compute. For high-utilization AI workloads, liquid cooling typically achieves payback within 18-24 months through OpEx savings and space efficiency.
Thermal Design & Efficiency Strategy
Explore More
Thermal Design & Efficiency Tools
Golden Zones Finder
Identify optimal datacenter locations based on power, fiber, incentives
Try Pro ToolRelated Resources
AI Colocation Pricing
Compare air-cooled vs liquid-cooled colocation pricing across major markets.
View pricing →Site Readiness Scores
Evaluate thermal infrastructure and cooling capacity readiness.
Check scores →Cooling Design Playbooks
Step-by-step guides for liquid cooling deployment and financial analysis.
Browse guides →