MIDAS platform accelerates protein engineering through rapid PCR-based screening

· News-Medical

Proteins are critical to life – and to industry. There are countless proteins that could be engineered to treat and even cure serious diseases and cellular dysfunctions. Industrial applications are similarly promising, with proteins increasingly used as enzymes in food manufacturing and in consumer detergents.

While AI can help suggest improvements, each novel protein must still be created in the real world and tested for performance. It is a labor-intensive process that involves constructing the DNA instructions for each protein in yeast or bacteria and growing individual clones for protein production and testing. This can take many days for a single protein of interest and even longer if the protein needs to be tested in mammalian cells, a process that requires retrieving DNA from microbes for transfer to the mammalian cells.

"The fundamental questions of molecular biology remain: how do we make better proteins and how do we understand what makes a protein work?" Lin says. "Doing that work takes valuable time and resources, but we've found a way to dramatically reduce those demands."

Going in circles

Lin and colleagues leapfrogged the traditional microbial assembly process by using a genetic replication technique known as polymerase chain reaction (PCR). PCR can amplify linear segments of DNA into millions or billions of copies very quickly. By using PCR to build entire genes used by mammalian cells to express a given protein, they bypassed the need for microbial cloning and DNA transfer. The PCR-produced gene variations can be directly transferred into mammalian cells for functional analysis. The only requirement for the PCR procedure is short strings of DNA known as "primers" that can be ordered for next-day delivery.

In traditional protein engineering, when researchers identify a promising variant, they have to assemble and clone the gene expressing the protein into a circular genetic structure known as a plasmid. They must then transfer the modified plasmids into the DNA of bacteria or yeast to produce suitable quantities of each unique plasmid DNA, which must then be transferred into mammalian cells for validation.

This clone-and-transfer process is laborious, slow, and expensive, and it greatly restricts the number of variants that can feasibly be evaluated. MIDAS changes that calculus. Lin and team's key insight was to do away with the circular plasmids, which are incompatible with PCR. Instead, they treat DNA as linear information that is ideally suited to PCR. This allows them to assemble hundreds of gene variants at a time and directly transfer them into mammalian cells in quantity to identify the best performers quickly and cost-effectively.

"We decided there's nothing magical about the circular structure of plasmids," Lin says. "For PCR, you just need the genetic data. That was the moment of inspiration."

A practical test of 384 variants using MIDAS took about four hours of hands-on lab work and about $2,000 in reagents. By existing methods, an experienced researcher would need approximately 192 hours and about $20,000 in reagents to evaluate just 24 variants. The researchers calculate that MIDAS is almost 50-times faster and a tenth the cost of cloning-based approaches.

Immediate impact

MIDAS could have immediate real-world implications for biological research. First, it should accelerate important enzyme and biosensor studies, the researchers say. Second, it could improve the automatic production of PCR primers that are ideally suited to modern liquid-handling robots, which can evaluate hundreds of new proteins at a time. Last, and perhaps most importantly, they believe MIDAS could drive better and bigger sequence-fitness datasets that could improve data-intensive AI training, leading to ever more powerful molecular design models.

"We used MIDAS not only to find the best-performing version of a protein but also to understand how well closely related variants work, which is information we can use to train AI models," says co-first author Pengli Wang. "MIDAS is so easy that we can use it to create large data sets very quickly."

Looking forward, Lin believes MIDAS could yield deeper combinatorial searches, tighter integration with robotics, and the generation of gene sequence-molecular fitness maps to feed improved machine-learning models that can fuel computational design and experimental validation.

"MIDAS is at least an order-of-magnitude faster at real-world validation," Lin says. "It compresses the engineering design-build-test cycle for proteins to just a couple of days, and we think it could drive rapid advances in AI-inspired molecular biology."

Source:

Stanford University

Journal reference: