Deepinfra lands $107M in funding to build out its dedicated inference cloud for open-source models
by Mike Wheatley · SiliconANGLEDedicated inference cloud startup Deepinfra Inc. is looking to expand its global capacity after raising $107 million in a Series B round of funding led by 500 Global and Georges Harik, who was one of Google LLC’s first cloud engineers.
The round saw participation from a number of heavy hitters, including Nvidia Corp., Samsung Next, the venture capital arm of Samsung Electronics Co. Ltd., Supermicro Computer Inc., as well as A.Capital Ventures, Crescent Cove, Felicis, Peak6 and Upper90.
Deepinfra says it’s trying to redesign the cloud infrastructure for artificial intelligence workloads as the industry shifts from experimental chatbots to production-scale “agentic workflows,” or systems that can do work autonomously without human intervention. It says the process of inference – running AI models in production – is hugely inefficient, primarily because traditional cloud platforms were never designed to support such workloads.
The startup believes that general-purpose cloud infrastructure platforms struggle to get to grips with the “always-on” nature of AI agents, which often have to make dozens or even hundreds of model calls to execute a single task. The result is that latency becomes unpredictable, leading to ballooning costs that can derail AI projects before they reach production.
Deepinfra aims to solve this by building a kind of “token factory” that treats inference as a primary process rather than a secondary cloud service. It was founded by the same team of engineers that created the popular messaging application imo, which was scaled up to more than 200 million users globally.
Instead of renting “spot” capacity from third-parties, the startup operates its own hardware across eight data centers in the U.S. This allows it to control the full infrastructure stack, from the graphics processing units to the application programming interfaces, enabling it to squeeze more performance out of its cloud hardware. The company leverages Nvidia’s Dynamo distributed-inference platform, along with its Blackwell and Vera Rubin GPUs, to deliver up to 20 times greater inference cost efficiency.
Deepinfra is especially interested in agentic AI, because it says these systems are much more resource-intensive and costly than traditional generative AI chatbots. Already, more than 30% of the token volume on its platform is driven by autonomous agents.
At present, its platform supports more than 190 open-source AI models, including Nvidia’s Nemotron family. It also offers a zero-data retention policy for enterprises wary of sending sensitive information to the cloud.
Co-founder and Chief Executive Nikola Borisov said he started the company four years ago because he was convinced that inference is going to become the dominant driver of enterprise AI workloads, and he believes that’s already the case now.
“What’s happening now is incredibly exciting, with open-source models rapidly reaching parity with proprietary systems, unlocking a wave of innovation at a fraction of the cost and enabling widespread adoption,” he said. “At the same time, agent-based systems are driving continuous, high-volume demand. Inference is no longer a thin layer – it’s the system constraint that will define the majority of workloads.”
Tony Wang of 500 Global said the demand for AI inference is going through the roof, and engineers and developers are finding that they need faster, more flexible and reliable infrastructure to support it. “Deepinfra’s team has already proven it can build and operate distributed systems at global scale, and we believe purpose-built inference will be fundamental to the next phase of AI,” he said.