Holy Peptide Model! BATMAN Outperforms Peptide-TCR Binding Prediction Standard

Imagine your immune cells could be modified to attack any kind of cancer. T cell receptor (TCR) therapy has the potential to one day become a universal cancer treatment. But there are risks. Cells announce their state by displaying peptides on their surface. These peptides are used by T cells to distinguish cancerous and healthy cells. While TCR therapy needs laser focus to prevent friendly fire, T cell receptors can recognize more than one peptide, and this cross-reactivity can lead to T cells attacking the wrong targets.

The numbers of peptides and TCRs in the human body is also enormous, making it nearly impossible and costly to determine to which peptides a given TCR can bind. Researchers at Cold Spring Harbor Laboratory (CSHL) have now developed a database, BATCAVE (benchmark for activation of T cells with cross-reactive avidity for epitopes), containing over 22,000 TCR-peptide interactions, together with an AI model, BATMAN (Bayesian inference of activation of TCR by mutant antigens), trained on BATCAVE, which can predict to which peptides a TCR will bind. During testing, BATMAN outperformed competing models in accurately predicting which peptides bind to a given TCR.

To develop BATCAVE and BATMAN, CSHL Assistant Professor Hannah Meyer, PhD, teamed with Associate Professor Saket Navlakha, PhD, and postdoc Amitava Banerjee, PhD. Senior and co-corresponding author Meyer and colleagues described their developments in Cell Systems, in a paper titled “T cell receptor cross-reactivity prediction improved by a comprehensive mutational scan database.”

A single T cell receptor can recognize a variety of peptides, a property known as TCR cross-reactivity, the authors explained. “Predicting which peptides a TCR cross-reacts to is critical for numerous applications, including predicting viral escape, cancer neoantigen immunogenicity, autoimmunity, and off-target toxicity of T cell-based therapies.” And while mapping all the targets of a T cell receptor is important for predicting pathogenic escape and off-target effects of TCR therapies, “… this mapping has been challenging due to lack of unbiased benchmarking datasets and computational methods sensitive to small-peptide mutations,” the team commented. “… predicting interactions among TCRs, peptides, and major histocompatibility complexes (TCR-pMHCs) remains challenging …”.

To address this challenge the team developed BATCAVE, a T cell receptor cross-reactivity database of 22,000 TCR-pMHC pairs. “… we curated the benchmark for activation of T cells with cross-reactive avidity for epitopes (BATCAVE) database, encompassing near-complete single-amino-acid mutational assays, centered around 25 immunogenic epitopes, across both major histocompatibility complex classes, against 151 human and mouse TCRs, containing 22,000+ TCR-peptide pairs in total,” they explained.

The investigators also created BATMAN, an interpretable Bayesian model, trained on BATCAVE, for predicting the peptides that activate a TCR. “Using this database, we then developed a computational method, called BATMAN, that predicts TCR activation of peptides based on their distances to the TCR’s index peptide,” they continued.

In addition, the team developed an active learning (AL) extension to BATMAN that efficiently maps targets of a novel TCR by selecting a few peptides to assay. “This version provides an efficient way to sample from the prohibitively large antigenic space by iteratively selecting peptides to assay that provide the best improvement of novel TCR activation prediction accuracy.”

Navlakha further explained, “We trained [BATMAN] on a bunch of TCRs and what they recognize. But give me a new TCR that is not in my database, and I need to figure out what it binds to. So, we ask, which are the best peptides I should select to make predictions?”

During testing, BATMAN outperformed competing models in accurately predicting which peptides bind to a given TCR. The AI also revealed why seemingly unrelated peptides get caught in the crossfire. “We show that BATMAN outperforms existing methods, reveals structural and biochemical predictors of TCR-peptide interactions, and can predict polyclonal T cell responses and TCR targets with high sequence dissimilarity,” the investigators commented. “Overall, the interpretable parameters learned by BATMAN capture a host of biological features that reveal the nature of TCR-pMHC interactions.”

Meyer further explained, “It’s not enough to just count differences between potential targets. It matters where the difference is and what type of difference it is. Our model is already good enough to tell us if there are peptides we should be concerned about for targeted [cancer] therapies.”

Despite the promise, there’s more to be done before BATMAN can venture from the BATCAVE for potential clinical use, the authors suggest. As large as the database is, it houses a fraction of all possible TCR-peptide pairs. More data could enhance BATMAN’s performance and potentially help scientists answer fundamental questions about the immune system.

“BATMAN could be improved by incorporating TCR sequence information into the model and by training on datasets from other types of experimental TCR cross-reactivity assays, (e.g., yeast display library enrichment, T-Scan, and SABR), which sample more comprehensively outside the one aa mutational scan space,” the investigators pointed out. Insights from their database and methods could in addition be used to predict how TCR sequences may determine the relationship between TCR-binding affinity and cross-reactivity. The data could potentially aid in the design of high-affinity TCRs with limited off-target cross-reactivity, which would have multiple clinical applications. “There’s a lot of variation in the body’s T-cell response,” Banerjee says. “If we can accurately predict how these cells and peptides interact, that will be very helpful for designing future therapies not only for cancer, but all human illnesses.”

In summary, the investigators wrote, “Overall, BATMAN fills a hitherto unoccupied niche of TCR-pMHC prediction methods by accurately discriminating between small differences in peptide sequences, which we show existing methods fail to predict, but which are essential for understanding neoantigen immunogenicity and off-target effects of TCR-based therapies.”