
pmid: 37386248
pmc: PMC10335929
AbstractThe UK Biobank performed whole-genome sequencing (WGS) and whole-exome sequencing (WES) across hundreds of thousands of individuals, allowing researchers to study the effects of both common and rare variants. Haplotype phasing distinguishes the two inherited copies of each chromosome into haplotypes and unlocks novel analyses at the haplotype level. In this work, we describe a new phasing method, SHAPEIT5, that accurately and rapidly phases large sequencing datasets and illustrates its key features on the UK Biobank WGS and WES data. First, we show that it phases rare variants with high accuracy. For instance, variants found in 1 sample out of 100,000 in the WES data are phased with accuracy above 95%. Second, we show that it can phase singletons, although with moderate accuracy, thereby making their inclusion in downstream analyses possible. Third, we show that the use of UK Biobank as a reference panel increases the accuracy of genotype imputation, an increase that is more pronounced when phased with SHAPEIT5 compared to other methods. Finally, we screen the phased WES data for loss-of-function (LoF) compound heterozygous (CH) events and identify 549 genes in which both gene copies are found knocked out. This list of genes complements current knowledge of gene essentiality in the human genome. We provide SHAPEIT5 in an open-source format, providing researchers with the means to leverage haplotype information in genetic studies.
Technical Report, Genotype, Haplotypes, Humans; Biological Specimen Banks; Exome Sequencing; Sequence Analysis, DNA/methods; Genotype; Haplotypes; Genome, Human/genetics; United Kingdom; Polymorphism, Single Nucleotide/genetics, Genome, Human, Exome Sequencing, Humans, Sequence Analysis, DNA, Polymorphism, Single Nucleotide, United Kingdom, Biological Specimen Banks
Technical Report, Genotype, Haplotypes, Humans; Biological Specimen Banks; Exome Sequencing; Sequence Analysis, DNA/methods; Genotype; Haplotypes; Genome, Human/genetics; United Kingdom; Polymorphism, Single Nucleotide/genetics, Genome, Human, Exome Sequencing, Humans, Sequence Analysis, DNA, Polymorphism, Single Nucleotide, United Kingdom, Biological Specimen Banks
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 155 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Top 1% | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Top 10% | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Top 0.1% |
