Growing void between enterprise and frontier AI puts open weights models in the spotlight
Most customers don't need the biggest baddest models, just ones that work, are cheap, and won't pirate their proprietary data
by Tobias Mann · The RegisterFEATURE Spring has sprung and that means another wave of open weights AI models from the likes of Google, Microsoft, Alibaba, and Nvidia. But this time feels a bit different.
In the past, these models have felt a bit like toys: research projects and proofs of concept that, while impressive for their size or innovation, still fell far short of OpenAI, Anthropic, or Google's top models.
But Qwen 3.5, Google's Gemma 4, and Microsoft's MAI speech and image models are a bit different. These models feel less like proofs of concept and more like enterprise products.
"We've moved from interesting to now serious enterprise platforms," Andrew Buss, senior research director at IDC, told El Reg.
The models underscore a stark reality: the gulf between enterprise and frontier AI has grown considerably over the past few years, and the mower powerful models are beyond the means of many enterprises.
"I think we are seeing a split," Buss said. "We're getting these larger, holistic models that are almost trying to be everything to everyone. But then we're also seeing the rise of smaller, more specialized models that are tailored and geared to around more specific outcomes or query types."
Frontier models' sovereign AI blind spot?
Accessing OpenAI's or Anthropic's top models requires exposing potentially sensitive customer data or intellectual property to an API or chatbot.
Both companies insist that they don't use enterprise or API data to train their models, but these are the same companies that have repeatedly been dragged into court for violating copyright.
Enterprises may be willing to use Gemini or Copilot to draft emails or sales proposals, but giving them access to proprietary data is a no go.
The alternative isn't great. There are a handful of large Chinese models from the likes of DeepSeek, Alibaba, Moonshot AI, and MiniMax that can get you within spitting distance of OpenAI or Anthropic. However, many of these models still require substantial infrastructure investments. Even Nvidia and AMD's enterprise-focused systems will set you back somewhere between $250,000 and $500,000 each.
But depending on the use case, enterprises don't necessarily need a frontier class model. What matters is whether the model is good enough to deliver the desired outcome, Buss said.
For their size, the latest open models from Google, Alibaba, Microsoft, and Nvidia are not only remarkably competitive, but also relatively cheap to run.
On Arena AI's text leaderboard, which allows the public to vote on which models generate the best outputs, Google's Gemma 4 31B (which refers to the 31 billion parameters it incorporates) is now the fourth-highest ranked open model, right behind Z.AI's GLM-5 and Moonshot AI's Kimi 2.5 Thinking, which at 744 billion and 1 trillion parameters, are orders of magnitude larger.
"There is an appetite and desire for AI in companies of all sizes, and we think there is a lot of relevance for companies in the mid market," Buss said. "For that, we need a range of both infrastructure hardware as well as the types of models that can run on them."
Google's new 31B-parameter model can easily be run at full 16-bit precision on a single RTX Pro 6000 Blackwell with plenty of room left over to support a reasonable number of concurrent requests and interactivity.
That's a card that routinely sells for between $8,000 and $10,000. It's a similar story with Qwen 3.5, where all but the two largest models would fit comfortably on a single GPU.
In many cases, these smaller enterprise-focused models may not even need that much compute, Buss notes. "We don't often need things like GPU acceleration. Even a lot of these AI workloads, ideally, can be loaded up and run on a fairly modern CPU based server," he said.
These smaller, more focused models mean they don't need much, if any, additional resources in order to customize them using techniques like QLoRA fine tuning or reinforcement learning.
What's changed?
So what's changed to make these models so much more capable? Quite a bit, actually.
The past year has seen a flurry of advancements not only in model training, but also in the frameworks necessary to harness them.
You may recall the market tumbling excitement around DeepSeek R1, which was among the first open-weights frontier models to employ reinforcement learning (RL) to replicate GPT-o1's chain-of-thought reasoning to trade time for higher quality outputs.
This approach, now referred to as test-time scaling, has helped smaller models make up for their lower parameter counts by "thinking" for longer.
The past year also saw more models add support for vision and audio processing, enabling them to analyze visual data, while smarter architectures and better compression techniques have further reduced the compute and memory resources required to run them.
But perhaps the biggest change is that the software used to harness these models to get actual work done has matured considerably.
These frameworks mean that models aren't limited to training data; they can retrieve information from the web, databases, and APIs, and take action based on the results through tool calls.
Google and Nvidia's models have been trained specifically with function calling in mind. In other words, they're not really intended to be used as standalone models. Some models, like Microsoft's MAI, take this to another level by optimizing for specific domains like speech recognition and image generation.
The challenge then becomes how to choose the right model for the job, Buss notes, suggesting that some kind of recommendation system will likely be required.
What do the model devs get out of this?
The ability to run local agents with access to proprietary data doesn't has particular benefits. For one, while these models are open, there is still a degree of lock-in. Any agents built with these models will have system prompts and tooling that have been tuned to that specific architecture.
It's about being able to reach markets that bigger models can't, Buss explained.
"If you have people developing using your technologies and approaches and IP, they're more likely to migrate up and stay in your ecosystem," he said. "It's a matter of basically having a product at the entry point... If you catch them young, as they grow, they will tend to keep with you over time."
Beyond the ecosystem play, these local models could help to drive down datacenter power consumption. The idea is not unlike OpenAI's GPT-5, which isn't one model, but multiple between which prompts are dynamically routed based not only on complexity but also on different policies.
The same logic could be applied in a disaggregated fashion, where a routing model running locally could direct prompts requiring access to proprietary data to a local LLM, while less sensitive requests could be offloaded to an API provider.
"I think there's a spectrum of solutions available, everything from fully private on-prem to sort of dedicated at the point of use in colocation datacenters, dedicated in the public cloud, to a shared environment for cost savings if your workload or prompts are not sensitive," Buss said. ®