Engineered enzymes are poised to have transformative impacts across applications in energy, materials, biotechnology, and medicine. Recently, machine learning has emerged as a useful tool for enzyme engineering. Now, a team of bioengineers and synthetic biologists says they have developed a machine-learning guided platform that can design thousands of new enzymes, predict how they will behave in the real world, and test their performance across multiple chemical reactions.
Their results are published in Nature Communications in an article titled, “Accelerated enzyme engineering by machine-learning guided cell-free expression,” and led by researchers at Stanford University and Northwestern University.
“Enzyme engineering is limited by the challenge of rapidly generating and using large datasets of sequence-function relationships for predictive design,” the researchers wrote. “To address this challenge, we develop a machine learning (ML)-guided platform that integrates cell-free DNA assembly, cell-free gene expression, and functional assays to rapidly map fitness landscapes across protein sequence space and optimize enzymes for multiple, distinct chemical reactions.”
“We’ve developed a computational process that allows us to engineer enzymes much faster because we don’t have to use living cells to produce the enzymes, as is now the case,” said Michael Jewett, PhD, a professor of bioengineering at Stanford University and senior author of the new study. “Instead, we use machine learning to predict highly active designer enzymes that have been engineered from mutated DNA sequences modeled on the computer instead of created by hand in the lab. We can carry out these experiments in days rather than weeks or, as is often the case, months.”
When working to engineer new enzymes, scientists typically have to start with an enzyme already known to nature and make changes to the enzymes to carry out the desired function. The DNA needed for these enzyme variants must be purchased from a third-party vendor and then be transferred manually into cells to produce the enzymes of interest. Jewett said machine learning can now overcome the challenge of having to run the potential thousands of iterations to try and find a single enzyme that might deliver the chemistry that a scientist is aiming to achieve.
“We can now do all that on a computer,” he said. “Rather than having to run 10,000 chemical reactions to iteratively improve enzyme activity, we can use machine learning models to predict highly active variants that still do just as well.”
The science of enzyme engineering is not new, only the application of machine learning to the field. Jewett and colleagues know it as “directed evolution.” They are shortcutting the process nature has gone through over the ages as DNA mutates by chance and new enzymes result, sometimes with important results.
“It is the structure of the proteins—which is created from the sequence of those amino acids in the molecule—that leads to their function,” Jewett said. “Directed evolution is a decades-old field that has developed the ability to mutate amino acids to change the function of the protein. We’re just speeding up the process using machine learning and computers.” A key feature of the team’s workflow is the ability to synthesize and test protein enzymes in cell-free systems without living intact organisms, which further accelerates the process.
As a proof of concept, Jewett and colleagues used their new tool to synthesize a small-molecule pharmaceutical at 90% yield—up from an initial 10% yield—and showed that it can be applied to build multiple specialized enzymes in parallel to make eight additional therapeutics. He is now looking for a pharmaceutical partner to further develop the model.
Jewett’s group has interest in expanding his machine learning models to guide catalysis or enzyme function across many different types of chemical reactions. In this paper, the team only looked at amide bond formation, a ubiquitous chemical reaction important in many different areas from pharmaceuticals to foods. But there are other opportunities.
“We could explore multiple opportunities in sustainability and the bioeconomy. You could begin thinking about classes of molecules that degrade toxins from the environment, enhance bioavailability of protein-rich foods, or others that take existing processes that require high pressures, costly components, or toxic reactions and make them faster, safer, and less expensive,” Jewett said.
The work is not without limitations. “High-quality, high-quantity functional data remains a challenge,” he said. “We all know AI needs lots of data, and at this point, it’s just not there.”
But, as science comes to use machine learning models more often to accelerate design, those data needs will only increase, Jewett said, pointing to future work. In this study, Jewett was ultimately able to assess about 3,000 enzyme mutants across about 1,000 products and about 10,000 chemical reactions.
“If I wanted to mutate an enzyme to test tens of thousands of variants,” Jewett said, providing a concrete example for scale, “I might find papers out there, but they may report mutant data for ten variants. Not hundreds. Not thousands. Not tens of thousands of reactions, but ten. So, we have a way to go on the data front, but we’ll get there. This is the first step.”