In 2001, after 11 years of work and nearly half-a-billion dollars in US government funding, two initial versions of a human reference assembly were published – but the work did not end there. The complete human genome sequence contained millions of letters ‘N’, meaning the actual base at that location is unknown, and entire regions could not be precisely mapped. It took another two decades to fill in those gaps. The effort was led by the open community-based Telomere-to-Telomere (T2T) consortium and resulted in an updated, complete assembly which was published in Science by Nurk et al last week.
The paper was accompanied by a perspective from Deanna Church, Inscripta’s Vice President of Mammalian Business Area and Software Strategy. Dr. Church began her career as a staff scientist at NCBI working on the Human Genome Project (HGP) and spent much of her career providing tools and resources to accelerate biological discovery, with the goal of empowering scientists to write genomes through a scalable, efficient approach. We got a chance to talk to her and ask questions about the implications of having a complete human genome reference and what it means for the future of genome engineering.
A complete reference facilitates analysis of biomedically important regions and could potentially help design gene therapies for heritable diseases. It also provides blueprint for de novo assembly of new sequences to make whole-genome sequencing more routine and allow us to gain additional insights into human evolution, genetic variability across different populations, and much more.
“We needed to build [the original] reference to gain perspective on what we did not know,” Church said. She suggests that the updated assembly should replace it as the new reference assembly for human genetic studies.
The reference assembly update project highlighted the importance of continuing technological innovation. The presence of complex repetitive sequences in the human genome presented a big hurdle to the original HGP, which left about 8% of the genome unmapped. What made the assembly of these complex regions possible is the advent of accurate long-read (~20Kb) and ultra-long read (>100Kb) sequencing technologies.
Having a complete and accurate human genome reference also opens the doors for developing and scaling new technologies like genome editing. This is an excellent example of how knowledge and technology feed into each other, allowing us to make scientific progress.
“If you want to write genomes, you have to have a good quality reference to know what you are changing,” explained Church. It is also really important to have the complete sequence in order to avoid potential off-target effects, which is critical for ensuring the safety of gene therapies.
Since 2001, there have been millions of human genomes sequenced, and we have learned a lot from that data. But Church thinks it’s important to tread carefully when it comes to using human genetic data and consider both biosecurity and bioethics implications. To make gene therapies and personalized medicine a reality for all, we must make genomic studies more inclusive and overcome the “genomic divide” that results in underrepresentation of certain groups. At the same time, we must ensure that the data is gathered in an ethical way and benefits the groups that have contributed to the research.
To learn more about the details of the T2T project, head over to Science.org to read Church’s perspective on the groundbreaking manuscript from Nurk et al. And for even more coverage on the impact of this project, check out these articles on GenomeWeb, Time or Washington Post.