Session: Session 02-05: Two Phase Cooling - II
Paper Number: 164278
164278 - Direct on Die Two Phase Cooling Approach for High Power GPUs
As the artificial intelligence boom accelerates, individual servers and racks have become exponentially more power-hungry. Over the past three years, there has been a 400% increase in the thermal design power (TDP) of both GPUs and CPUs. This surge in power demand has necessitated a transition to liquid cooling, as air cooling has reached its practical limits. Among liquid cooling solutions, two-phase direct-to-chip cooling has emerged as a promising approach to managing escalating TDPs and heat fluxes due to its high heat transfer coefficients and the advantageous properties of the boiling phenomenon. Additionally, the use of a dielectric working fluid provides a safeguard against potential damage to IT equipment in the event of a leak—an inherent risk in single-phase cooling solutions such as direct-to-chip water cooling.
In the industry, this cooling method is conventionally referred to as direct-to-chip cooling, despite the fact that the coolant does not come into direct contact with the chip surface; instead, a thermal interface material (TIM) and a cold plate are used. This study explores the effectiveness of a two-phase cooling approach in which the refrigerant directly impinges on the die, eliminating the need for both a TIM and a cold plate, thereby reducing the case-to-fluid thermal resistance. The working fluid used in this study was the medium-pressure refrigerant R515b. Testing was conducted on a thermal test vehicle (TTV) designed to mimic the die area of the NVIDIA Blackwell (B200) GPU, with a TDP of up to 4 kW, subjecting the system to heat fluxes exceeding 250 W/cm².
Experiments were performed on both a flat die and an enhanced surface featuring a skived fin base to simulate a heat sink that could be bonded directly to the chip surface. Various fluid manifold designs were evaluated to assess how different flow paths influence heat transfer performance. The results demonstrated that the direct-to-die approach significantly reduced thermal resistance (case to fluid). Additionally, this improvement in thermal resistance allows the system to operate effectively at higher facility water temperatures of 50–55°C. The enhanced performance and potential for heat reuse at elevated temperatures highlight the feasibility of integrating heat sinks directly onto the silicon chip surface, enabling a direct-to-die cooling strategy.
This study underscores the potential of two-phase cooling, particularly when using a dielectric fluid, as it enables direct impingement on the die without the risk of short-circuiting IT equipment. These findings suggest a promising pathway for GPU manufacturers to adopt this advanced cooling approach, facilitating more efficient thermal management in high-performance computing environments.
Presenting Author: Akshith Narayanan Accelsius
Presenting Author Biography: Akshith Narayanan is a recent Masters Graduate in mechanical engineering from Georgia Institute of Technology. He completed his degree with a thesis working on an investigation of near-junction flow boiling of high-power electric vehicle inverters. He joined Accelsius, an emerging two phase direct to chip liquid cooling company looking to provide an elegant two-phase solution for data center high power chips. He has years of experience in heat transfer, fluid mechanics with specific expertise in two phase boiling.
Direct on Die Two Phase Cooling Approach for High Power GPUs
Paper Type
Technical Paper Publication