Long-Read Nanopore Sequencing Improves Rare Disease Diagnosis

Whole genome sequencing (WGS) is not necessarily a solution for someone with a rare, monogenic disease. Indeed, more than half of families with suspected rare monogenic diseases do not have an answer after WGS when short-read sequencing is performed. Now, a group of researchers present data that suggests that long-read sequencing may be the better road to go down for these people. The more comprehensive dataset can find variation, eliminate the need for multiple specialized tests, and streamline the diagnosis of rare diseases.

This work is published in The American Journal of Human Genetics, in an article titled, “Advancing long-read nanopore genome assembly and accurate variant calling for rare disease detection.” The findings suggest that long-read sequencing has the potential to improve the rate of diagnosis while reducing the time to diagnosis from years to days—in a single test and at a much lower cost.

“Today, the diagnostic yield of genetic sequencing is frustratingly low,” said Benedict Paten, PhD, professor of biomolecular engineering at UCSC Genomics Institute. “One likely cause is the incomplete sequencing methods used in clinical practice. In this work, we test the hypothesis that new, more comprehensive long-read sequencing can generate additional information useful for genetic diagnosis. We were excited to discover numerous additional potentially interesting genetic variants and epigenetic signals in our cohort. While it is still early days, there is great promise in this information, and it will take time for the community to interpret and fully understand much of this new information.”

The researchers partnered with clinicians to work on the cases of 42 patients with rare diseases—some of whom received a diagnosis via short-read methods or other testing, and some of whom were still undiagnosed. In some cases, the researchers had access to parental genetic information, but in others, they did not.

Long-read sequencing of the patients was done by nanopore sequencing. The genomic data was analyzed using computational methods to find small and large variants, phasing data, and methylation data, all using one pipeline called the Napu pipeline. The analysis process takes around a day or less, depending on the computer processing speed, and costs $100.

After sequencing and analyzing the patient data—achieving ∼36× average coverage and 32-kb read N50 from a single flow cell—the researchers found that long-reads provided a more exhaustive dataset as compared to what can be derived with short-read sequencing (SRS).

The long read sequencing covered “coding exons in ∼280 genes and ∼5 known Mendelian disease-associated genes that were not covered by SRS. It also detected “rare, functionally annotated variants, including structural variants (SVs) and tandem repeats, and completely phased 87% of protein-coding genes.”

The result was a conclusive diagnosis for 11 of the 42 patients in the cohort, providing everything that was known from the short-read data as well as additional information, including additional rare candidate variants, long-range phasing, and methylation.

The 11 diagnosed cases included four of congenital adrenal hypoplasia (a rare condition where the adrenal glands are enlarged and fail to function properly). The gene responsible for this disease is in a particularly challenging region of the genome—it can’t be characterized with short-read sequencing technology, and the current clinical test is cumbersome and incomplete.

“To solve these cases, we developed a new pangenomic tool that integrates new high-quality assemblies like the ‘telomere-to-telomere’ reference genome,” said Jean Monlong, PhD, a former postdoc in the Paten lab who is currently at INSERM in France. “We were excited to see that we could find and phase the pathogenic variants of all four patients suffering from this disease in our cohort. In the future, it might offer a rapid and comprehensive clinical test. We know many rare diseases involve regions of the human genome that have been historically difficult to study, so our results encourage us to extend our approach to more of those diseases that have been at a standstill for a long time.”

In addition, two cases involved disorders of sex development, while one rare case of Leydig cell hypoplasia affected male sexual development due to underdeveloped Leydig cells in the testes. Additionally, four cases of neurodevelopmental disorders, each representing long and challenging diagnostic odysseys, were finally resolved.

“There’s so much more of the genome that the long reads can unlock,” said Shloka Negi, a UCSC BME PhD student. “But, it will take some time until we can fully interpret this new information revealed by long reads. This data has been absent from our clinical databases, which were built using short-read analysis and mapping to the standard reference. We showed that long reads are uncovering about 5.8% more of the telomere-to-telomere genome that short reads simply couldn’t access.”