Team develops open-source framework to accelerate health AI research

· Medical Xpress

by Columbia University Irving Medical Center

edited by Lisa Lock, reviewed by Andrew Zinin

Lisa Lock

Scientific Editor

Meet our editorial team
Behind our editorial process

Andrew Zinin

Lead Editor

Meet our editorial team
Behind our editorial process Editors' notes

This article has been reviewed according to Science X's editorial process and policies. Editors have highlighted the following attributes while ensuring the content's credibility:

fact-checked

trusted source

proofread

The GIST Add as preferred source


Credit: Pixabay/CC0 Public Domain

A research team led by Columbia University has developed an open-source framework designed to streamline and accelerate artificial intelligence research using health data, addressing longstanding challenges in data standardization, reproducibility, and collaboration across institutions.

The framework, called MEDS, introduces both a standardized data format and a growing ecosystem of interoperable tools intended to support the development and evaluation of machine learning models using clinical data. A study describing the framework was published in NEJM AI.

The researchers say the framework could help reduce technical barriers that currently slow health AI research and make it difficult for scientists to reproduce findings or compare models across studies and institutions.

"MEDS is a simple way to make all different sources of electronic health record (EHR) data look the same to your code, regardless of what hospital or clinic or EHR software system the data came from," says Matthew McDermott, Ph.D., assistant professor of biomedical informatics at Columbia University and study leader.

"MEDS lets us share code that we can use to train models on many different sites of care without needing to share sensitive patient data—and often without needing to even do the more challenging step of fully 'harmonizing' the data into a consistent clinical vocabulary. This infrastructure will allow researchers to spend less time rebuilding pipelines and more time answering clinically meaningful questions."

Standardizing health data for clinical AI research

Electronic health record data are often stored in institution-specific formats that require extensive preprocessing before they can be used for AI development. According to the study authors, these inconsistencies can create significant duplication of effort, limit collaboration, and hinder reproducibility.

MEDS addresses these issues by providing a lightweight, extensible standard for representing longitudinal clinical data in machine learning workflows. The framework also includes open-source tooling that supports data transformation, preprocessing, benchmarking, and model development.

The authors emphasize that MEDS was designed specifically for AI and machine learning applications, complementing rather than replacing existing clinical data standards.

The framework is intended to support a broad range of use cases in biomedical AI research, including predictive modeling, representation learning, multimodal modeling, and large-scale benchmarking studies. Because the ecosystem is open source, researchers across academia, health care, and industry can contribute tools and extensions.

"The big successes in AI have always been driven by the community coming together and being able to collaborate, often in a decentralized, open-source manner, on tools, model parts, and ultimately ecosystems that let us build larger models that scale to massive datasets," McDermott said. "These impressive results in MEDS are just reflecting the benefits you get when the community can share tools or abstract common parts of their pipelines out into a shared library and use them across everyone's data."

The study also highlights the importance of reproducibility and transparency in health AI development as machine learning models increasingly move toward clinical deployment.

The researchers say they hope MEDS will foster broader collaboration across institutions and accelerate innovation in clinical AI while promoting more transparent and reproducible science. Already, MEDS has been adopted across 21 institutions spanning 12 countries.

More information

Matthew B. A. McDermott et al, MEDS — An Emerging Data Standard and Ecosystem for Health AI Research, NEJM AI (2026). DOI: 10.1056/aira2501253

Key medical concepts

Electronic Health Records Provided by Columbia University Irving Medical Center Who's behind this story?

Lisa Lock

BA art history, MA material culture. Former museum editor, paramedic, and transplant coordinator. Editing for Science X since 2021. Full profile →

Andrew Zinin

Master's in physics with research experience. Long-time science news enthusiast. Plays key role in Science X's editorial success. Full profile →

Citation: Team develops open-source framework to accelerate health AI research (2026, May 29) retrieved 30 May 2026 from https://medicalxpress.com/news/2026-05-team-source-framework-health-ai.html This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no part may be reproduced without the written permission. The content is provided for information purposes only.