Databricks co-founder Matei Zaharia wins ACM Prize and declares AGI is already here

10 Apr 2026, 06:09 by Alina Maria Stan · TNW

In short: Matei Zaharia, the Berkeley computer science professor and Databricks co-founder who created Apache Spark, has won the 2026 ACM Prize in Computing for his foundational contributions to distributed data systems and AI infrastructure. The $250,000 prize, funded by an Infosys endowment, is one of computer science’s most prestigious mid-career honours. Zaharia is donating the prize to charity. In an interview following the announcement, he argued that AGI has already arrived, “it’s just not in a form that we appreciate”, and that the field should stop benchmarking AI against human cognition.

From PhD thesis to global infrastructure

Zaharia began building Apache Spark as a doctoral student at UC Berkeley in 2009, a faster alternative to Hadoop MapReduce, which had become the default framework for large-scale distributed data processing but was burdened by slow disk-based I/O between stages. Spark moved intermediate computation into memory, cutting processing times for iterative workloads, machine learning training, graph processing, stream analysis, from hours to minutes or seconds. The performance gap was decisive enough that Spark effectively superseded MapReduce for most analytical workloads within a few years of its release. It remains one of the most widely deployed data processing frameworks in the world. Zaharia’s doctoral dissertation on Spark won the ACM Doctoral Dissertation Award in 2014, and the project became the seed of Databricks, the data and AI company he co-founded in 2013 with six Berkeley colleagues. Databricks reached a $134 billion valuation in December 2025 following its Series L funding round, and disclosed a revenue run rate of $5.4 billion in February 2026, growing at more than 65% year on year. The ACM, in its prize citation, credited Zaharia with “visionary development of distributed data systems and computing infrastructure, which has enabled large-scale machine learning, analytics, and AI at global scale.” The open-source ecosystem Zaharia helped popularise, Apache Spark is licensed under Apache 2.0, the same licence Google used last week for its Gemma 4 open-weight model family, has become the default framework for AI model and tool releases that aim for broad commercial adoption.

Delta Lake, MLflow, and the data lakehouse

Zaharia’s contributions did not stop at Spark. As data infrastructure moved to the cloud and organisations began storing vast quantities of unstructured data in object stores such as Amazon S3, a new set of problems emerged: cloud data lakes were fast and cheap but unreliable, with no transactional guarantees, no consistent schema enforcement, and no principled way to handle concurrent writes. Zaharia co-developed Delta Lake to solve this, bringing ACID transactional semantics to cloud object stores and enabling a new architectural pattern, the data lakehouse, that combined the cost and scale advantages of a data lake with the consistency and governance properties of a traditional data warehouse. The lakehouse architecture is now Databricks’ core commercial product and has been widely adopted across enterprise data engineering. A third project, MLflow, addressed the operational chaos that had emerged as machine learning moved from research into production. Teams building ML models had no consistent way to track experiments, version models, or manage deployments across the diverse set of tools, Scikit-learn, TensorFlow, PyTorch, XGBoost, that a single organisation might use simultaneously. MLflow provided a structured lifecycle framework that became one of the leading platforms for operationalising AI at scale.

Agents, DSPy, and the current research frontier

Zaharia’s recent research has shifted from data infrastructure to the systems that make AI agents more reliable and capable. He is a co-author of DSPy, an open-source framework that automatically optimises the prompts and parameters used to instruct language models for specific tasks, replacing the manual prompt engineering that has become a significant source of brittleness in production AI systems. A related project, GEPA, extends this approach to agent quality, focusing on how to improve the reliability of multi-step AI workflows where errors compound across sequential decisions. The common thread across Zaharia’s career is systems thinking applied to the parts of AI that are not the model itself — the data pipelines, the experiment tracking, the deployment infrastructure, and now the agent orchestration layer. The enterprise AI deployment ecosystem that has grown around these tools is now a significant commercial market in its own right: Infosys, which funds the ACM Prize through its endowment, is also one of the anchor partners in Anthropic’s Claude Partner Network, launched in March 2026 with $100 million committed to enterprise AI deployment, a market that would not exist in its current form without the data and ML infrastructure Zaharia’s open-source work made accessible. “The thing that I’m most excited about,” Zaharia said in the TechCrunch interview, “is what I’d call AI for search, but specifically for research or engineering.” He envisions students and researchers using AI to simulate molecular-level changes in biological systems and predict their outcomes, autonomous scientific investigation at a scale and speed that no human team could replicate.

“AGI is here already”, the claim and what he means

The most attention-generating moment in the announcement was not the prize itself but a statement Zaharia made about the state of AI. “AGI is here already,” he told TechCrunch. “It’s just not in a form that we appreciate.” The claim is provocative, but his elaboration clarifies what he is and is not saying. The conventional definition of artificial general intelligence, a system capable of performing any intellectual task that a human can, sets up a comparison between AI and human cognition that Zaharia argues is the wrong frame. “We should stop trying to apply human standards to these AI models,” he said. His reasoning is that the capabilities of current AI systems are structurally different from human intelligence rather than simply weaker. A human can only pass the bar exam if they have integrated vast amounts of legal knowledge through years of study; an AI can ingest the same corpus in minutes. If it then answers legal questions correctly, Zaharia’s point is that dismissing this as “not really intelligence” because it was acquired differently is an arbitrary standard. The debate over how to define and measure AI progress is shaping competitive strategy at the top labs: Demis Hassabis recently described how Google DeepMind restructured itself to accelerate research pace, characterising the current AI race as “ferocious” and “historically intense“. Zaharia’s redefinition of AGI is less a triumphalist claim than a methodological argument: the field’s insistence on measuring AI against human benchmarks may be causing it to misunderstand what the systems it has already built are actually capable of. The commercial evidence for that capability is accumulating rapidly, Anthropic, whose models run on the data infrastructure that Zaharia’s open-source work helped normalise, reached a $30 billion annual revenue run rate earlier this year. The year 2025 marked the shift in AI from research novelty to operational infrastructure, and Zaharia, whose career has always been about the layer beneath the layer everyone else is looking at, has been building the foundations for that shift since his PhD.

Get the TNW newsletter

Get the most important tech news in your inbox each week.