Descripción del proyecto
DNA sequencing has transformed biological research, and is fast becoming the standard-of-care in medicine. Chances are good that every European person born today may have their genome sequenced at some point in their lifetime (it is already happening in Iceland). The currently prevailing sequencing platform, Illumina short-read sequencing, prioritizes content (DNA basepairs) over preserving long-range haplotype context. This decision has paid off so far, but comes at the cost of complicated analyses and often lead to undetected disease-causing chromosome rearrangements. Alternative long-read technologies, however, have yet to deliver the scale and throughput to truly take over from Illumina sequencing. To meet the grand challenge of sequencing humanity and biodiversity, we need both content and context. As part of our ERC StG action, we have invented haplotagging, a technique that restores haplotype context while integrating seamlessly into Illumina sequencing. We have previously published a small-scale pilot study that haplotagging can deliver superior results at a fraction of existing costs. In doing so our team have solved a series of molecular engineering and computational challenges that have stumped far larger parties. However, our prototype dataset are still not large enough to fully allay concerns from potential licensees regarding robust scalability. Thus, we propose to generate a mid-scale, 2000-sample pilot with direct support from Illumina to provide additional benchmark data, guide scale-up efforts and inform decision-making by industry and academic stakeholders. We expect this proposal will help transition haplotagging from a research concept towards the market.