New artificial intelligence model maps how genes work together inside cells

· News-Medical

Scientists at the Icahn School of Medicine at Mount Sinai have created a new artificial intelligence (AI) model that helps reveal how genes function together inside human cells, offering a powerful new way to understand biology and disease.

The study, published in the May 21 online issue of Patterns, a Cell Press Journal [https://doi.org/10.1016/j.patter.2026.101565], introduces a gene set foundation model (GSFM) designed to learn patterns in how genes are grouped and function across thousands of biological contexts. The work draws inspiration from advances in large language models (LLMs) such as ChatGPT, which learn how words gain meaning depending on their context. In a similar way, a GSFM learns how genes behave differently depending on their cellular "context."

The model provides a new way to understand the structural and functional organization of genes and their products inside human cells. This improved understanding could eventually support the development of better diagnostics, biomarkers, and therapies. By mapping how genes relate to one another across many biological situations, the GSFM creates a reference framework that can help scientists interpret complex multi-omics datasets more effectively, say the investigators.

"The organization of genes within cells remains one of the major unsolved questions in biology. The GSFM helps address this by learning from millions of gene groupings derived from published research and gene expression datasets," says Dr. Ma'ayan.

The model can:

  • Help identify the function of poorly understood genes without immediate laboratory experiments
  • Highlight genes involved in disease processes
  • Suggest potential new drug targets and biomarkers
  • Provide a reusable knowledge system for many types of biomedical research data analysis tasks-for example, improved gene set enrichment analysis

In essence, say the investigators, GSFM offers a new "map" of how genes work together in different contexts.

To build the model, the researchers compiled millions of gene sets from published scientific studies and gene expression datasets. In total, the system learned from hundreds of thousands of independent research efforts.

The AI model was trained in a way similar to solving a puzzle: it was given part of a gene set and asked to predict the missing pieces. Over time, it learned underlying patterns that describe how genes are grouped and interact.

The AI model was then benchmarked against other approaches and demonstrated strong performance, including the ability to identify gene-gene and gene-function relationships before they were confirmed experimentally. To evaluate this, the model was trained using gene sets from publications up to a defined cutoff date, and then tested on whether it could predict discoveries reported in studies published after that cutoff date.

"Unlike previous biological AI models that primarily rely on gene expression data, our GSFM is uniquely trained on gene sets, a different and largely underused type of biological information," says Dr. Ma'ayan. "This approach allows the model to integrate diverse data from many diseases, experimental methods, and research conditions, creating a unified representation of gene relationships across biology."

GSFMs could enhance existing bioinformatics tools and improve the interpretation of data collected with omics technologies. One immediate application is in gene set enrichment analysis, a widely used method in molecular biology research. By improving how scientists interpret gene groupings, the model may help uncover new biological insights from both existing and future datasets.

The research team plans to expand the system by combining GSFM with other AI foundation models. One goal is to integrate it with language-based models to generate natural-language explanations of gene functions. Another future direction is combining GSFM with drug-focused AI models, with the long-term aim of predicting how drugs interact with cells and supporting the design of new therapeutics.

Source:

Mount Sinai Health System

Journal reference: