Bill Gates-backed startup aims to revive Moore's Law with optical transistors
Neurophos is developing a massive optical systolic array clocked at 56GHz good for 470 petaFLOPS of FP4 compute
by Tobias Mann · The RegisterAs Moore's Law slows to a crawl and the amount of energy required to deliver generational performance gains grows, some chip designers are looking to alternative architectures for salvation.
Neurophos is among those trying to upend Moore's Law and make good on analog computing's long-promised yet largely untapped potential.
The Austin, Texas-based AI chip startup says it's developing an optical processing unit (OPU) that in theory is capable of delivering 470 petaFLOPS of FP4 / INT4 compute — about 10x that of Nvidia's newly unveiled Rubin GPUs — while using roughly the same amount of power.
Neurophos CEO Patrick Bowen tells El Reg this is possible in part because of the micron-scale metamaterial optical modulators, essentially photonic transistors, that the company has spent the past several years developing.
"The equivalent of the optical transistor that you get from Silicon Photonics factories today is massive. It's like 2 mm long. You just can't fit enough of them on chip in order to get a compute density that remotely competes with digital CMOS today," he explained.
Neurophos' optical transistors, Bowen says, are roughly 10,000x smaller. "We got our first silicon back in May demonstrating that we could do that with a standard CMOS process, which means it's compatible with existing foundry technologies."
Using these transistors, Neurophos claims to have developed the optical equivalent of a tensor core. "On chip, there is a single photonic tensor core that is 1,000 by 1,000 [processing elements] in size," he said.
This is quite a bit bigger than what's typically seen in most AI accelerators and GPUs, which employ matrix multiplication engines that are at most 256x256 processing elements in size.
However, rather than having dozens or even hundreds of these tensor cores, like we see in Nvidia's GPUs, Neurophos only needs one. Bowen tells us the tensor core on its first-gen accelerator will occupy roughly 25 mm2.
The rest of the reticle-sized chip is "the boondoggle of what it takes to support this insane tensor core," Bowen said.
Specifically, Neurophos needs a whack-ton of vector processing units and SRAM to keep the tensor core from starving for data. This is because the tensor core itself — and yes, again, there'll only be one of them on the entire reticle-sized die — is operating at around 56 gigahertz.
But because the matrix-matrix multiplication is done optically, Bowen notes that the only power consumed by the tensor core is what's needed to drive the opto-electrical conversion from digital to analog and back again.
Neurophos says its first OPU, codenamed the Tulkas T100, will feature a dual reticle design equipped with 768 GB of HBM that's capable of 470 petaOPS while consuming 1 to 2 kilowatts of power under load.
As impressive as all this sounds, it's important to remember that these figures are more like goal posts at this point. The chip is still in active development with full production not expected to begin until mid-2028. Even then, Bowen doesn't expect it to ship in large volumes. "We're talking thousands of chips. Not tens of thousands of chips."
While Neurophos believes its optical tensor cores can address a broad array of AI inference workloads, it expects its first chip will be used primarily as a prefill processor.
As we've previously discussed, LLM inference can be broken into two phases: a compute intensive prefill stage in which input tokens are processed, and a memory bandwidth bound stage in which output tokens are generated.
Over the past year or so, we've seen chip designers like Nvidia disaggregate prefill and decode into separate pools of GPUs. For its latest generation of GPUs, Nvidia has developed a dedicated prefill accelerator that it calls Rubin CPX.
Bowen envisions the Tulkas T100 filling a similar role as Rubin CPX. "The current vision, which is subject to change, is basically we would put one rack of ours, which is 256 of our chips, and that would be paired with something like an NVL576 rack," he said.
Long-term, Bowen aims to tackle the decode phase as well, but notes that a variety of technologies, including co-packaged optics, will need to be developed before the startup is ready to take on token generation.
While the Tulkas T100 won't ship until at least 2028, Bowen says the company is actively working on a proof of concept (PoC) chip to validate the compute and power densities it's claiming.
This week, Neurophos completed a $110 million Series-A funding round led by Gates Frontier, with participation from Microsoft's venture fund and other investors, which Bowen says will fund development of this PoC. ®