Tensordyne Napier AI Chip Promises Huge Inference Gains Over NVIDIA Blackwell And Rubin
by Matt Lawrence · OnMSFTTensordyne has announced the successful tape-out of its 3nm Napier AI chip, with the company claiming major gains over NVIDIA Blackwell and Rubin in AI inference workloads. The new chip will power the Tensordyne Napier TDN system, which targets large AI models with higher token throughput, lower power use, and tighter memory integration.
Napier Targets AI Inference At Scale
Tensordyne says Napier uses TSMC’s 3nm process and packs 138 billion transistors, 144GB of HBM3E memory, 256MB of SRAM, and 2.1 PFLOPs of dense FP8 AI compute within a 300W power envelope. The company has also highlighted more than $200 million in forecasted Napier system demand as it moves toward beta deployment.
The Napier platform combines logarithmic AI math, integrated SRAM and HBM memory, and an any-to-any scale-up interconnect. This design replaces large multiplication operations with addition-based computation, which Tensordyne says improves performance per watt across large frontier models.
TDN72 Rack Takes Aim At NVIDIA
Tensordyne’s TDN72 Inference Pod uses 72 Napier AI chips, while a full rack includes 288 chips across four pods. The rack delivers 608 PFLOPs of FP8 compute, 74GB of SRAM, 42TB of HBM3E memory, and a rated power draw of 120kW.
According to Tensordyne, Napier delivers 17 times more tokens per watt and 13 times higher token throughput than NVIDIA Blackwell, while a single rack can support multi-trillion parameter models at up to 1,000 tokens per second per user. The company also claims its system can match a much larger Rubin-based setup with far less infrastructure, giving Napier a strong pitch for AI companies focused on inference efficiency.