Gero’s Structure-Free AI Model Generates Small Molecules from Protein Sequences

Credit: Jian Fan / iStock / Getty Images Plus

Gero, a biotechnology company developing novel therapeutics for aging and chronic diseases, has released ProtoBind-Diff, a novel masked diffusion language model that generates drug-like molecules for protein targets using only their amino acid sequences. The company has integrated ProtoBind-Diff into its internal drug discovery pipeline and says that it is seeking partners for collaborative programs in oncology, immunology, infectious disease, and aging-related conditions.

Details of the model’s performance and design are published in a recent preprint titled “ProtoBind-Diff: A Structure-Free Diffusion Language Model for Protein Sequence-Conditioned Ligand Design.” According to its developers, ProtoBind-Diff was trained on more than a million active protein-ligand pairs. Unlike structure-based models, which are limited to a small set of resolved protein-ligand complexes, ProtoBind-Diff leverages a larger pool of chemical and biological data, which helps the model generalize to underexplored targets with sparse or unavailable structural data.

“Designing small molecules that hit protein targets is one of the hardest problems in drug discovery. Classical modeling struggles because the energy scales, polarization effects, and the complexity of protein dynamics make high-resolution predictions nearly impossible,” explained Peter Fedichev, PhD, Gero’s CEO and co-founder. In contrast, ProtoBind-Diff “learns from sequences, not structures. It doesn’t simulate physics. It learns the grammar of bioactivity from a million real examples.”

Developed as a foundational component of Gero’s generative drug discovery platform, ProtoBind-Diff uses pre-trained protein embeddings and a denoising diffusion framework to generate novel molecules guided by protein sequence data. The developers benchmarked its performance against both classical docking methods and structure-aware deep learning methods. Despite never observing 3D information during its training, results reported in the preprint indicate that ProtoBind-Diff’s performance matched or exceeded that of structure-based models such as Pocket2Mol and TargetDiff for both well-characterized and low-data targets. Furthermore, the model identified molecules that the company described as high in novelty, drug-likeness, and synthesizability.

It is still early days in the model’s development, and already ProtoBind-Diff “outperforms some existing 3D structural models,” said Konstantin Avchaciov, PhD, senior researcher at Gero and lead scientist behind the project. “I am confident that as we continue to expand our datasets to include a broader diversity of protein classes, we will achieve significantly better results in the future.”