ai-reveals-how-protein-modifications-link-mutations-to-disease
AI Reveals How Protein Modifications Link Mutations to Disease

AI Reveals How Protein Modifications Link Mutations to Disease

Scientists at the Baylor College of Medicine say they have developed an artificial intelligence (AI) model that reveals how protein modifications link genetic mutations to disease. The method, called DeepMVP, described in a study “DeepMVP: deep learning models trained on high-quality data accurately predict PTM sites and variant-induced alterations” in Nature Methods, significantly outperforms previously published models and has implications for the development of novel therapeutics, according to the research team.

“Proteins are responsible for all the functions of the body, from growing tissues to regulating metabolism or fighting disease. Their functions are often regulated by modifications that take place after proteins are made through a process called post-translational modification (PTM),” said Bing Zhang, PhD, corresponding author professor at the Lester and Sue Smith Breast Center and of molecular and human genetics at Baylor. He also is a McNair scholar and a member of Baylor’s Dan L Duncan Comprehensive Cancer Center.

The modifications include the addition of chemical groups, such as phosphates or sugars, which influence how a protein behaves, where it goes in the cell or how long it lasts. When PTMs go wrong, the proteins may not perform as expected and contribute to diseases like cancer, heart conditions or neurological disorders.

Bing Zhang, PhD
Bing Zhang, PhD

Understanding where PTMs happen can help predict how mutations in these locations may change a protein’s function in ways that affect a person’s health. For instance, PTMs can be disrupted by DNA mutations that can remove a PTM site in a protein, create a new site or affect nearby regions, altering the protein’s function.

“We developed DeepMVP, a computational model to predict where in a protein PTMs happen and which mutations in those locations can affect PTMs,” said co-first author Chenwei Wang, PhD, a postdoc in the Zhang lab. “To train the model to recognize patterns in protein sequences that indicate PTM sites, we created the PTMAtlas, a curated compendium of known 397,524 PTM sites generated through systematic reprocessing of 241 public datasets. We focused on six common PTMs.”

PTMAtlas includes nearly 400,000 PTM sites across thousands of human proteins. Compared to other databases, PTMAtlas is more comprehensive and accurate—it can predict PTM sites on all human proteins and even in viral proteins like those from SARS-CoV-2, noted Zhang, adding that this indicates that DeepMVP is a powerful resource for studying protein modifications.

DeepMVP reportedly outperformed eight existing similar tools. Testing its ability to predict how mutations affect PTM using a curated set of 235 known mutation-PTM pairs from scientific literature showed that DeepMVP correctly predicted the PTM site in 81% of cases and the direction of change (increase or decrease) in 97% of cases.

“We anticipate that DeepMVP can be applied to cancer, neurological conditions and cardiovascular diseases and accelerate discoveries in genetics, cancer biology and drug development,” Zhang said. “The tool is freely available to researchers worldwide at https://deepmvp.ptmax.org/.”