Researchers develop open source framework for health AI research
· News-MedicalThe framework, called MEDS, introduces both a standardized data format and a growing ecosystem of interoperable tools intended to support the development and evaluation of machine learning models using clinical data.
The researchers say the framework could help reduce technical barriers that currently slow health AI research and make it difficult for scientists to reproduce findings or compare models across studies and institutions.
Matthew McDermott, PhD, assistant professor of biomedical informatics at Columbia University and study leaderMEDS is a simple way to make all different sources of electronic health record (EHR) data look the same to your code, regardless of what hospital or clinic or EHR software system the data came from. MEDS lets us share code that we can use to train models on many different sites of care without needing to share sensitive patient data - and often without needing to even do the more challenging step of fully 'harmonizing' the data into a consistent clinical vocabulary. This infrastructure will allow researchers to spend less time rebuilding pipelines and more time answering clinically meaningful questions."
Standardizing health data for clinical AI research
Electronic health record data are often stored in institution-specific formats that require extensive preprocessing before they can be used for AI development. According to the study authors, these inconsistencies can create significant duplication of effort, limit collaboration, and hinder reproducibility.
MEDS addresses these issues by providing a lightweight, extensible standard for representing longitudinal clinical data in machine learning workflows. The framework also includes open-source tooling that supports data transformation, preprocessing, benchmarking, and model development.
The authors emphasize that MEDS was designed specifically for AI and machine learning applications, complementing rather than replacing existing clinical data standards.
The framework is intended to support a broad range of use cases in biomedical AI research, including predictive modeling, representation learning, multimodal modeling, and large-scale benchmarking studies. Because the ecosystem is open source, researchers across academia, healthcare, and industry can contribute tools and extensions.
"The big successes in AI have always been driven by the community coming together and being able to collaborate, often in a decentralized, open-source manner, on tools, model parts, and ultimately ecosystems that let us build larger models that scale to massive datasets," McDermott said. "These impressive results in MEDS are just reflecting the benefits you get when the community can share tools or abstract common parts of their pipelines out into a shared library and use them across everyone's data."
The study also highlights the importance of reproducibility and transparency in health AI development as machine learning models increasingly move toward clinical deployment.
Source:
Columbia University Irving Medical Center
Journal reference: