proteogenomics-algorithm-improves-the-detection-of-hidden-genetic-mutations
Proteogenomics Algorithm Improves the Detection of Hidden Genetic Mutations

Proteogenomics Algorithm Improves the Detection of Hidden Genetic Mutations

Protein synthesis. Illustration of a ribosome (centre) producing a protein (red) from an mRNA (messenger ribonucleic acid, multicoloured) template. This process is known as translation. mRNA consists of groups of three nucleotide bases that code for different amino acids, the building blocks of proteins. The ribosome attaches to the mRNA and reads its code. A transfer RNA (tRNA) molecule (dark purple) carrying an amino acid (red) corresponding to the code then binds to the ribosome. When the tRNA dissociates it leaves the amino acid behind, and the ribosome moves onto the next bases. As the ribosome moves along the mRNA the protein grows from the ribosome.

Credit: Juan Gaertner / Science Photo Library / Getty Images

Proteogenomics explores how genetic information translates into protein expression and function, and the role of changes across DNA, RNA, and proteins in influencing disease development and progression. A major challenge is the inability to accurately detect variant peptides, as existing proteomic tools often fail to capture the full diversity of protein variations, limiting the ability to identify genetic mutations at the protein level. 

In a new study published in Nature Biotechnology titled, “Identification of non-canonical peptides with moPepGeN,” researchers from the University of California, Los Angeles (UCLA), and University of Toronto have developed a new algorithm, termed moPepGen (multi-omics peptide generator), which leverages a graph-based approach to significantly improve the detection of hidden protein variants to process all types of genetic changes in a computationally efficient manner. 

The algorithm’s applications include improving immunotherapy by identifying cancer-specific variant peptides that may serve as neoantigen candidates, which is key to developing personalized cancer vaccines and cell therapies.   

“By making it easier to analyze complex protein variations, moPepGen has the potential to advance research in cancer, neurodegenerative diseases, and other fields where understanding protein diversity is critical,” said Paul Boutros, PhD, professor of urology and human genetics at the David Geffen School of Medicine at UCLA and co-senior author of the study. “It bridges the gap between genetic data and real-world protein expression, unlocking new possibilities in precision medicine and beyond.” 

moPepGen identifies non-canonical peptides (NCPs), which are peptides derived from regions of the genome that are not typically considered part of protein-coding genes. NCPs have been documented to have key roles in cellular processes, immune responses, and cancer. However, existing methods for identifying NCPs are computationally expensive, have elevated false-negative rates, and can lead to difficult data interpretation. 

Unlike existing methods, which primarily detect simple genetic changes such as single amino acid substitutions, moPepGen is designed to identify a wide range of protein variations caused by alternative splicing, circular RNAs, gene fusions, RNA editing, and other complex genetic modifications. The tool systematically models how genes are expressed and translated into proteins, significantly expanding the ability to detect disease-associated mutations. 

In human cancer proteomes, moPepGen enumerated previously unobservable NCPs arising from germline and somatic genomic variants, noncoding open reading frames, RNA fusions, and RNA circularization. 

The authors used moPepGen to analyze proteogenomic data from five prostate tumors, eight kidney tumors, and 376 cell lines. They found that moPepGen successfully identified previously undetectable protein variations linked to genetic mutations, gene fusions, and other molecular changes. The algorithm also offered performance improvements by detecting four times more unique protein variants than older approaches. 

“We developed moPepGen to help researchers determine which genetic variants are truly expressed at the protein level, addressing a long-standing challenge in the proteogenomic community,” said Chenghao Zhu, PhD, a postdoctoral scholar at the department of human genetics at UCLA and co-first author of the study. “This provides a more comprehensive view of protein diversity and gives researchers a much more accurate picture of how mutations influence disease.” 

The tool is freely available for researchers and can integrate with existing proteomics workflows.