Credit: Juan Gaertner / Science Photo Library / Getty Images
Proteogenomics explores how genetic information translates into protein expression and function, and the role of changes across DNA, RNA, and proteins in influencing disease development and progression. A major challenge is the inability to accurately detect variant peptides, as existing proteomic tools often fail to capture the full diversity of protein variations, limiting the ability to identify genetic mutations at the protein level.
In a new study published in Nature Biotechnology titled, “Identification of non-canonical peptides with moPepGeN,” researchers from the University of California, Los Angeles (UCLA), and University of Toronto have developed a new algorithm, termed moPepGen (multi-omics peptide generator), which leverages a graph-based approach to significantly improve the detection of hidden protein variants to process all types of genetic changes in a computationally efficient manner.
The algorithm’s applications include improving immunotherapy by identifying cancer-specific variant peptides that may serve as neoantigen candidates, which is key to developing personalized cancer vaccines and cell therapies.
“By making it easier to analyze complex protein variations, moPepGen has the potential to advance research in cancer, neurodegenerative diseases, and other fields where understanding protein diversity is critical,” said Paul Boutros, PhD, professor of urology and human genetics at the David Geffen School of Medicine at UCLA and co-senior author of the study. “It bridges the gap between genetic data and real-world protein expression, unlocking new possibilities in precision medicine and beyond.”
moPepGen identifies non-canonical peptides (NCPs), which are peptides derived from regions of the genome that are not typically considered part of protein-coding genes. NCPs have been documented to have key roles in cellular processes, immune responses, and cancer. However, existing methods for identifying NCPs are computationally expensive, have elevated false-negative rates, and can lead to difficult data interpretation.
Unlike existing methods, which primarily detect simple genetic changes such as single amino acid substitutions, moPepGen is designed to identify a wide range of protein variations caused by alternative splicing, circular RNAs, gene fusions, RNA editing, and other complex genetic modifications. The tool systematically models how genes are expressed and translated into proteins, significantly expanding the ability to detect disease-associated mutations.
In human cancer proteomes, moPepGen enumerated previously unobservable NCPs arising from germline and somatic genomic variants, noncoding open reading frames, RNA fusions, and RNA circularization.
The authors used moPepGen to analyze proteogenomic data from five prostate tumors, eight kidney tumors, and 376 cell lines. They found that moPepGen successfully identified previously undetectable protein variations linked to genetic mutations, gene fusions, and other molecular changes. The algorithm also offered performance improvements by detecting four times more unique protein variants than older approaches.
“We developed moPepGen to help researchers determine which genetic variants are truly expressed at the protein level, addressing a long-standing challenge in the proteogenomic community,” said Chenghao Zhu, PhD, a postdoctoral scholar at the department of human genetics at UCLA and co-first author of the study. “This provides a more comprehensive view of protein diversity and gives researchers a much more accurate picture of how mutations influence disease.”
The tool is freely available for researchers and can integrate with existing proteomics workflows.