AI Unsurprisingly Dominates Hot Chips 2024

This year's edition of the annual Hot Chips conference represented the peak in the generative-AI hype cycle. Consistent with the theme, OpenAI's Trevor Cai made the bull case for AI compute in his keynote. At a conference known for technical disclosures, however, the presentations from merchant chip vendors were disappointing; despite a great lineup of talks, few new details emerged. Nvidia's Blackwell presentation mostly rehashed previously disclosed information. In a picture-is-worth-a-thousand-words moment, however, one slide included the photo of the GB200 NVL36 rack shown below.

GB200 NVL36 rack (Source: Nvidia)

Many customers prefer the NVL36 over the power-hungry NVL72 configuration, which requires a massive 120kW per rack. The key difference for our readers is that the NVLink switch trays shown in the middle of the rack have front-panel cages, whereas the "non-scalable" NVLink switch tray used in the NVL72 has only back-panel connectors for the NVLink spine/backplane. Although Nvidia omitted cabling details, external NVLink cables could include passive (DAC) and active copper (ACC/AEC) as well as pluggable optics. The NVL36 creates a new market segment for cabling, albeit primarily for copper in the near term.

Intel announced its Gaudi 3 AI accelerator back in April, but it's worth reiterating its networking specifications. The new chip integrates 24x200G Ethernet ports using 48x112G SerDes. Like Microsoft's Maia, Gaudi 3 chips connect directly for scale up and through switches for cluster scale out. The integrated NICs support the RoCE protocol for low-latency RDMA data transfers. In a separate talk, Intel once again discussed the 4Tbps Optical Compute Interconnect (OCI) chiplet that we've written about previously. The speaker emphasized the value of laser integration and stressed the 0.1FIT reliability seen with more than 30 million lasers shipped to date.

Broadcom presented its copackaged optics (CPO) technology for AI-compute ASICs with optical attach. Whereas the first part of the presentation mostly covered its Bailly CPO switch, the latter part included some new details and concepts for CPO in XPUs. The company introduced the concept of package "oceanfront," which differs from silicon beachfront. Attaching the optical engines to the CoWoS package substrate provides more linear space for CPO attach than the silicon interposer. The talk also introduced the idea of using bidirectional (bidi) signaling in future optical engines to cut the number of required fibers in half.

Maybe it's appropriate that the merchant chip vendors took a back seat to data-center operators this year. Afterall, it's the latter group driving massive investments in AI infrastructure. Diversity of AI architectures should be good for the supply chain, opening opportunities outside of Nvidia's bespoke full-stack solutions.

LightCounting subscribers can access additional details, figures, and analysis in a full-length research note: www.lightcounting.com/login

Comments

Popular posts from this blog

AMD Looks to Infinity for AI Interconnects

NVIDIA Networks NVLink

Ultra Ethernet Promises New RDMA Protocol