Q&A: What AI actually does in diffusion models for drug design

by

Lisa Lock

scientific editor

Meet our editorial team
Behind our editorial process

Robert Egan

associate editor

Meet our editorial team
Behind our editorial process
Editors' notes

This article has been reviewed according to Science X's editorial process and policies. Editors have highlighted the following attributes while ensuring the content's credibility:

fact-checked

peer-reviewed publication

trusted source

proofread

The GIST
Add as preferred source


Isolated fragments are connected by a linker generated by the diffusion model. DiffSHAPer rationalizes the process, by determining which atoms favor (or oppose) the generation. Credit: Andrea Mastropietro

In the search for new drugs, artificial intelligence in the form of diffusion models is being used in drug design. What exactly does AI do in this context? Dr. Andrea Mastropietro and Prof. Dr. Jürgen Bajorath from Life Science Informatics at the University of Bonn and the Lamarr Institute for Machine Learning and Artificial Intelligence have published an article in Cell Reports Physical Science on this topic. Below, they answer questions about this method.

What is your research about?

Diffusion models have been mainly used for image and video generation. Recently, their usage has been extended to new domains, such as chemistry for the generation of new molecules. For our analysis, we aimed at generality and approached the explanation of diffusion models for linker design of molecules with different applications.

What is a linker?

A linker is a substructure of a molecule that connects two or more disconnected fragments of atoms. Linker design is an important task in drug development, as it plays a central role in the design of effective molecules with specific properties.

How do diffusion models work in principle?

Diffusion models learn a data distribution and generate new data by sampling from that distribution. The diffusion model itself is an advanced AI model. We try to understand its generative process.

How does 'noise' come into play?

Adding and removing noise is the hallmark of diffusion models. Starting from a sample in the dataset (an image or, in our case, a molecule), they add "noise" until the original sample is "destroyed"—like the transition from a detailed image to a TV static effect. Then, the model learns how such added noise needs to be removed to retrieve a valid sample, generating a new image (or molecule).

How did you proceed?

For our study, we selected a state-of-the-art diffusion model for linker design and developed a novel explainability strategy extending a well-known concept in the field of explainable artificial intelligence: Shapley values. For our method, termed 'diffusion model Shapley additive explainer' (DiffSHAPer) we adapted the widely used Shapley value formalism for explaining machine learning predictions to diffusion models. Our goal was to find which fragment atoms were the most influential for linker generation.

What is the most important finding?

We found that to generate chemically valid linkers, diffusion models do not learn or exploit chemistry principles, but they mostly rely on distance constraints between atoms. Therefore, they take into account recurrent statistical patterns in the data without learning generalizable chemical rules.

What was the biggest challenge?

From a computational perspective, running inference and explaining the generations of diffusion models are time-consuming tasks. From a methodological perspective, our approach represents a novelty; therefore we had to find the best way to present our results effectively.

Is there an application?

Our methodology can be used to understand what molecular diffusion models learn. In the specific case of linker design, it's useful to determine what drives the generation of the linker. Linkers are important in drug design, as they can improve critical molecular properties (such as potency and stability). Consequently, a linker generated solely based on distance and geometric constraints does not guarantee optimization of properties or practical chemical utility.

What are the next steps?

The first step would be to apply DiffSHAPer to molecular diffusion models tailored to different tasks. Future research will be focused on the development of models able to include more chemical context in their internal reasoning.

Publication details

Andrea Mastropietro and Jürgen Bajorath, Explaining a molecular diffusion model, Cell Reports Physical Science (2026). DOI: 10.1016/j.xcrp.2026.103270. www.cell.com/cell-reports-phys … 2666-3864(26)00176-1

Journal information: Cell Reports Physical Science

Key concepts

Stochastic processesStructural propertiesArtificial intelligenceStatistical methods

Provided by University of Bonn