ML-Enhanced Modeling Improves mAb Design Space Predictions

For monoclonal antibodies (mAbs), traditional design space identification methods typically rely on expensive wet-lab experiments or in silico models that, for low-density data, need improvement. A hybrid approach developed by researchers from Imperial College London (ICL) and GSK appears to offer those improvements, yielding design space predictions that are accurate and reliable—even for low-density datasets.

The approach developed by a team led by Maria Papathanasiou, PhD, associate professor, ICL, and Andrea Galeazzi, PhD, postdoctoral researcher, ICL, and first author of a recent paper, informs in silico models from similar projects with results from data-driven experiments. The resulting enhanced hybrid model significantly improves the predictive capability of data-driven modeling and reduces the need for process knowledge.

“The approach can help manufacturers expedite process development by enabling quantitative knowledge transfer and by reducing experimentation,” Papathanasiou tells GEN. It is particularly valuable in early-stage process development.

To determine the design space, the researchers focused on single-column antibody Protein A affinity chromatography. For rich datasets, experimental data may be sufficient if high-quality data is available in sufficient quantity. But, where data is scarce, that data is augmented by a machine learning-enhanced in silico model.

For the latter hybrid model, researchers pretrained an artificial neural network on synthetic data generated by a high-fidelity process model parameterized for a different system then fine-tuned that dataset using wet-lab data. The benefit, they report, is that wet-lab data can be more limited in terms of quantity and quality, and that knowledge can be leveraged from other systems without being parameterized to the system of interest.

This approach, however, generates a larger design space than that generated by a purely data-driven approach. Adding the wet-lab data to the in silico model effectively reparameterizes it without restructuring the model, and results in better agreement with the fully data-driven model.

Better accuracy

This hybrid system delivered reliable predictions for the target system, the researchers determined.

When they compared the first-order F1 score (a metric that measures model performance) for high-, medium-, and low-density data between the data-driven and hybrid models, they found that the hybrid model improved predictive accuracy from 6–27%. For medium-density regions, the hybrid model improved predictive accuracy by 7–15%, although one run was less accurate. For high-density data, there was no significant improvement.

Using second-order F1 scores, which, Papathanasiou explains, “quantify agreement between a target model’s predictions and those of a reference model,” the hybrid model improved prediction accuracy for low-density data sets by 6–41%. For medium-density datasets, the discrepancy ranged from a decline of 9% to an improvement of 5%.

“Second-order F1 scores (and, therefore, this hybrid model) can be particularly useful where true experimental labels are scarce, incomplete, or unavailable, which is a common challenge in early-stage biopharmaceutical process development,” she says.