democratizing-artificial-intelligence-in-pre-clinical-drug-discovery
Democratizing Artificial Intelligence in Pre-Clinical Drug Discovery

Democratizing Artificial Intelligence in Pre-Clinical Drug Discovery

Mount Sinai scientists
Avner Schlessinger, PhD, right, working with his lab researchers at Mount Sinai AI Small Molecule Drug Discovery Center in New York City. [Mount Sinai]

Most breakthrough discoveries are made based on evidence that’s already there,” Ming-Ming Zhou, PhD, asserted in his New York City office as we overlooked a cloudy Central Park. “It takes somebody to connect the dots in a different way to solve the problem.”

Zhou began his faculty career in 1997 and is currently a professor in physiology and biophysics at the Icahn School of Medicine at Mount Sinai. His lab designs chemical compounds to modulate chromatin-mediated gene transcription for therapeutic applications. Zhou’s seminal work in chemical targeting of the bromodomain, a set of proteins that recognize acetylated lysine in histones, opened the pharmaceutical field of bromodomain drug discovery to address a wide array of cancers and inflammatory disorders.

Reflecting on the past three decades of therapeutic research, Zhou says structure-based drug discovery has now transformed into artificial intelligence (AI)-aided drug discovery, a booming interdisciplinary field transforming pre-clinical pipelines from building on literature-defined disease targets to massive searches through big data for never-before-seen leads. According to Zhou, this paradigm shift is what Mount Sinai’s new AI Small Molecule Drug Discovery Center is tackling head-on.

Building a home for AI-innovation

Led by Avner Schlessinger, PhD, professor of pharmacological sciences and associate director of the Mount Sinai Center for Therapeutics Discovery, the AI initiative launched in April as New York’s newest hub leveraging computational approaches for pre-clinical development.

To expand access to AI-driven drug discovery, the center will provide hands-on training for the next generation of scientists through seminars, internship programs, and drug discovery hackathons, while fostering AI-focused research collaborations with pharmaceutical companies, biotech firms, and academic institutions.

“Drug discovery is an inefficient process. One of the top limiting factors is insufficient communication, interaction, or thinking outside the box,” Zhou told GEN. “This center is a way of bringing people together to unlock new ideas and technologies that can help us address this limitation.”

In contrast to conventional drug discovery, which relies on slow and resource intensive experimental workflows, AI models trained on vast datasets of molecular structures and biological activity can predict properties of new compounds before synthesis, an approach that is proposed to expand the throughput and scale of pre-clinical research programs at a fraction of the speed and cost.

Mount Sinai’s center will focus on three core areas: designing novel drug-like molecules using generative AI, optimizing existing compounds to enhance their efficacy and safety, and predicting drug-target interactions to repurpose known drugs or natural products for new indications.

“I was trained in AI and machine learning a long time ago before it was cool,” chuckled Schlessinger as we dodged NYC taxis during my tour of Mount Sinai’s campus. “But now is particularly good timing to use Mount Sinai’s datasets and experts to improve our models for real solutions.”

As a medical school embedded in a hospital system, Mount Sinai’s community emphasizes making an impact on patient care. Many research projects have a highly translational focus, ranging from target identification for Alzheimer’s disease to developing machine learning algorithms to predict the pathogenicity of mutations based on patient data.

Marta Filizola, PhD, professor and dean of the graduate school of biomedical sciences at Mount Sinai, leads the center’s graduate education efforts, and highlights the need for interdisciplinary education to generate the next wave of AI innovation, a notion that has motivated the establishment of Mount Sinai’s newest PhD program in Artificial Intelligence and Emerging Technologies in Medicine (AIET).

“We’ve created an infrastructure to increase the visibility of the AI training here at Sinai and give students hands-on experience in research programs that are directly related to improving human health,” she told GEN.

Show me the data

Historically, structure-based drug discovery has largely been fueled by the protein data bank (PDB), a publicly available dataset that houses over 200,000 entries for experimentally-determined protein and nucleic acid structure data collected by researchers over 50 years.

While the PDB has been a powerful resource driving AI advances, such as the Nobel Prize in Chemistry-winning protein structure prediction algorithm AlphaFold, many novel drug targets fall outside of the PDB, motivating many AI biotechs to invest in their own data generation. Much of this proprietary industry data remains under lock and key.

“A key problem for any party that builds and innovates new model architectures is that they cannot benchmark on proprietary data. The validity for industrial grade research is something that you cannot assess,” said Robin Roehm, CEO and co-founder of Apheris, in an interview with GEN. “Access to industry data for benchmarking is a huge value-add for everyone who builds models.”

Apheris is a start-up focused on enabling governed, private, and secure access to data for machine learning. In March, the Berlin-based company announced an initiative with the AI Structural Biology (AISB) Consortium to fine-tune OpenFold3, a protein structure prediction algorithm developed by the lab of Mohammed AlQuraishi, PhD, assistant professor of systems biology at Columbia University, using proprietary data from AbbVie and Johnson & Johnson in a confidentiality-preserving environment.

The collaboration will evaluate and refine OpenFold3 for predicting 3D structures of molecule complexes, focusing on small molecule-protein and antibody-antigen interactions for drug discovery. As of May, the list of participating drug developers has expanded to include AstraZeneca, Boehringer Ingelheim, Sanofi, and Takeda.

The push for an open-source code

Other scientists are looking to make AI molecular models widely accessible to push collaboration forward. In June, researchers from the Massachusetts Institute of Technology (MIT) Jameel Clinic for Machine Learning in Health announced the open-source release of Boltz-2, which now predicts molecular binding affinity at newfound speed and accuracy to help democratize commercial drug discovery.

Boltz-2 is available under the highly permissive MIT license, which allows commercial drug developers to use the model internally and apply their own proprietary data. The work was done in collaboration with Recursion, the Salt Lake City-based artificial intelligence (AI) drug discovery company that combined with Exscientia last year. The MIT research team was led by Regina Barzilay, PhD, distinguished professor of AI and health at MIT.

Boltz-2 is an answer to the community outcry at the limited accessibility of AlphaFold 3, which was published in Nature in May 2024 by Google DeepMind and Isomorphic Labs without the open-source code. AlphaFold 3 expanded the protein structure prediction tool to a broad spectrum of biomolecular interactions, including small molecules, DNA, RNA, and more, offering a powerful next step for drug discovery.

However, the code omission prevented other scientists from reproducing the publication’s results and using the model in their own research efforts, leading more than 1,000 scientists to sign a protest letter calling for AlphaFold 3’s transparency. To address the outcry, AlphaFold 3 developers released the code under a restrictive non-commercial license six months after the Nature publication.

Anshul Kundaje, PhD, associate professor of genetics and computer science at Stanford University, wrote in a letter sent to Nature and posted on the social media platform, X, that while commercial entities are under no obligation to open source or share details about their products, “this does not mean they get to bypass canonical standards for what constitutes a peer-reviewed and verifiable scientific publication. What Nature published as a peer-reviewed article is in fact an advertisement and at best a white paper.”

Back at MIT, Corso said the biggest reward from releasing Boltz was seeing the community rally behind an open-source project.

“Just at a time where it seemed inevitable that closed models like AlphaFold 3 would dominate the field, many researchers from academia and industry decided to contribute to an open-source project like Boltz to build new capabilities and open them up for everyone to use,” Corso told GEN.

OX2 with Code
Modeling a potential small molecule therapeutic’s binding to its target (in this case, the OX2 protein). [Recursion]

Lifting all boats

While AlphaFold 3 made the advance of accurately predicting molecular complex structures, in silico binding affinity calculations—as achieved by Boltz-2—had not been (publicly) shown by DeepMind and Isomorphic Labs. Binding affinity measures the strength of interaction between a drug and its target and is a key drug discovery metric that can dictate the progression of a candidate through the development pipeline from hit discovery to lead optimization.

In terms of accuracy, Boltz-2 was the leading affinity performer at the December 2024 Critical Assessment of protein Structure Prediction 16 (CASP16) competition, the biannual experiment that assesses the latest state-of-the-art models in structural biology. In speed, Boltz-2 is reported to calculate binding affinity values in just 20 seconds, 1,000 times faster than the current physics-based computational standard, free-energy perturbation (FEP) simulations.

Najat Khan, PhD, chief R&D officer and chief commercial officer at Recursion, stated the open-source release of Boltz-2 “lifts all boats” in making progress in the integration of tech, biology, and chemistry.

“Binding affinity is core to developing a therapeutic start to finish and has been the fundamental issue that a lot of us have been trying to grapple [with],” said Khan. “The value of this collaboration is significant technological advancement geared to the purpose of application, which is drug discovery.”

In May, Recursion said it will end development for four of its 11 pipeline programs and pause a fifth program, in a pruning designed to further focus the AI-based drug developer on cancer and rare disease treatments. The company looks forward to applying Boltz-2 toward future discovery candidates.

While proprietary restrictions remain a reality in commercial interests, education, data partnerships, and open-source modeling march forward to push a culture of collaboration. Time will tell whether the new AI drug discovery paradigm will be one of true democracy.