A novel data gathering approach that relies on “text mining” can help process developers better understand the complex relationships between culture conditions and glycosylation, say the authors of new research.
A protein’s glycosylation profile—the pattern of glycan residues that bind to the core molecule during production—dictates its therapeutic function and efficacy. A reproducible profile is key to achieving quality and consistency goals.
And, for the most part, biopharma is good at using experimentation to understand how changes to culture conditions are likely to impact glycosylation processes for a given developmental manufacturing process.
Industry has been less adept at using these experimental findings to develop generalized glycosylation relationship models, says Chuming Chen, PhD, a professor at the Delaware Biotechnology Institute at the University of Delaware.
“Despite the extensive body of published work, general relationships between different cell culture conditions and glycosylation profiles remain fragmented across diverse studies. Fragmentation in our knowledge of causes of specific glycan profiles is partially a result of variables and conditions changing from one context to another, and these are not always easily tracked,” he tells GEN.
Test mining and knowledge graphs
With this in mind, Chen, colleagues, and scientists at Waters who co-authored the study, developed an automated way of gathering data from multiple sources—using a technique called “text mining”—and elucidating relationships between various conditions and glycosylation.
“First, we designed a specialized text mining pipeline to automatically extract relationships between cell culture conditions and glycosylation profiles with an 88% accuracy from unstructured scientific literature, eliminating the need for manual curation,” Chen explains.
The researchers then used a normalization strategy to reconcile inconsistencies in the extracted information to ensure consistency.
“These standardized entities and the relationships among them were used to build a unified Knowledge Graph [called the Bioprocess Knowledge Graph Database], which captures both direct and hidden, indirect associations between process parameters and therapeutic glycan outcomes.
“Finally, we developed a web interface that enables researchers to dynamically query, explore, and visualize these complex relationships, ultimately facilitating more informed decision-making in therapeutic protein manufacturing,” he says.
The approach has application in biopharmaceutical manufacturing, according to Chen, who suggests it can be used to guide early-phase process development.
“The benefit to biopharmaceutical manufacturing process researchers is to identify specific contexts and conditions that may increase or decrease the target glycan structure. Because glycan structure can sometimes impact the mechanism of action or drug-patient interactions, this can be highly useful information,” he adds.
Looking forward, Chen and his co-authors plan to expand their system to include more information that is relevant to production.
“Our final product is an interface that is queryable and visualizable. Although it has been developed as a prototype, this is automated and can be extended to incorporate more information related to biopharmaceutical manufacturing. We are currently extending our project to include deep learning and LLM for relation extraction,” he says.

