Nvidia powers further into the CPU market with new rack systems packing 256 Vera processors

The cubicals of the agentic AI age are cores

by · The Register

GTC Intel and AMD take notice. At GTC on Monday, Nvidia unveiled its latest liquid-cooled rack systems. But unlike its NVL72 racks, this one isn't powered by GPUs or even Groq LPUs, but rather 256 of its custom Vera CPUs.

The system is designed to support AI training techniques like reinforcement learning as well as agentic AI frameworks and services that can't run on GPUs alone.

"Agents don't operate on GPUs alone. They need CPUs in order to do their work, whether we're training agentic models or serving them, GPUs today actually call out to CPUs in order to do the tool calling, SQL queries and the compilation of code," Ian Buck, VP of Hyperscale and HPC at Nvidia told press on Sunday. "This sandbox execution is a critical part of both training and deploying agents across data centers."

Those CPUs need to be fast to avoid becoming a bottleneck. That requires a new kind of AI-optimized CPU which balances per core frequency, density, and power efficiency, Buck argues.

Nvidia is no stranger to CPU design. Its first datacenter CPU, Grace, was announced nearly five years ago and has become an integral part of the company's Grace-Hopper and Grace-Blackwell rack systems since.

While most of these deployments were tied to GPU systems or HPC clusters, Meta recently revealed plans to deploy Nvidia's standalone Grace CPUs at scale within its datacenters.

A closer look at Vera

Vera is Nvidia's latest CPU and brings several notable improvements, including 88 custom Olympus Arm cores, support for simultaneous multithreading, a much wider memory bus, and faster chip-to-chip interconnects.

In addition to powering Nvidia's Vera-Rubin superchips, paper-launched at CES earlier this year, Nvidia plans to offer its CPUs as an alternative to x86 chips from Intel and AMD.

The company is making some rather bold claims about its latest CPU superchips. If Nvidia is to be believed, Vera will deliver 3x more memory bandwidth and 1.5x the performance per core than contemporary x86 processors.

Much of that performance is down to Nvidia's new Olympus Arm cores, which now feature a 10-wide decode pipeline with what Nvidia describes as a "neural branch predictor" that can perform two branch predictions per cycle. 

Branch prediction is key to performance in modern CPUs, and involves anticipating future code paths and executing down them before they're needed. By predicting two paths per cycle, Vera decreases the likelihood of a miss predict, theoretically boosting its performance in the process.

Nvidia also benefits from its use of LPDDR5X memory, more commonly found in notebook computers, rather than the RDIMMS used by conventional servers. 

Each Vera CPU can be equipped with up to 1.5 TB of LPDDR5 SOCAMM memory modules good for 1.2 TB/s of bandwidth per socket. For reference, Intel's top 6900P processors top out at 825 GB/s of bandwidth when using 8800 MT/s MRDIMMs, while AMD's Turin processors top out between 560 and 600 GB/s.

The chips also feature faster NVLink-C2C interconnects, enabling them to shuffle data to and from other CPU or GPUs at up to 900 GB/s (advertised as 1.8 TB/s bidirectional bandwidth) in either direction.

Many of the tasks performed by agentic systems involve retrieving data and executing code against it, making high memory bandwidth key to avoiding bottlenecks.

Availability

Vera will be available in both single- and dual-socket configurations from the usual ODM and OEM suspects, including Foxconn, Wistron, Dell Tech, Lenovo, and HPE, to name a handful.

That means that this time around Nvidia will actually be competing head-to-head with AMD and Intel in the CPU space. To this end, Nvidia says that its NVL8 HGX systems, which have traditionally used x86 processors from Intel, will be offered with Vera CPUs for the Rubin generation.

For high-density deployments, Nvidia also has a new MGX reference design that packs up to 256 Vera processors along with 64 BlueField-4 data processing units into a single liquid-cooled rack, providing more than 22,500 CPU cores, and 400 TB of memory for agents to retrieve data and execute code.

It doesn't look like Nvidia will need to fight to win over customers. When Vera makes its debut later this year, Alibaba, ByteDance, Meta, Oracle, CoreWeave, Lambda, Nebius, and NScale have all committed to deploying the chips in their datacenters. ®