
Database of protein sequences obtained using ProHap (https://github.com/ProGenNo/ProHap) on the data set of phased genotypes published by the Haplotype Reference Consortium, Release 1.1 (https://ega-archive.org/datasets/EGAD00001002729). We used Ensembl v.110 for the mapping of coordinates between genes, exons, and transcripts. Release 1.1 of the HRC is provided aligned with the GRCh37 reference genome. We have performed a liftover to the GRCh38 reference using GeneBe (https://genebe.net/tools/liftover). Variants for which the reported alternative allele is considered as reference in GRCh38 were removed. A threshold of 1% minor allele frequency was applied to filter the remaining variants. After translation, a frequency threshold of 0.5% was applied to filter the resulting unique non-canonical sequences. The complete configuration file for the ProHap run is attached to this repository. This dataset contains one compressed directory, contains the following files: F1: The concatenated fasta file ready to be used with search engines, contains the following: Protein haplotype sequences obtained by ProHap Reference proteome as per Ensembl v. 110 Contaminant sequences from the cRAP project (https://www.thegpm.org/crap/) The file is provided in two formats - full and simplified. The simplified fasta contains only the artificial protein identifier and the matching gene name, and is optimised for compatibility with a wide range of tools. For annotation of peptides using the PeptideAnnotator, please provide the header (F1.2) in addition to the fasta file. F2: Additional information about the haplotype sequences, to be used for mapping identified peptides to the original haplotypes F3: Translations of haplotype cDNA sequences, before merging with the reference proteome For further description of the files, please refer to https://github.com/ProGenNo/ProHap/wiki/Output-files. For the usage of these databases with search engines, and downstream anaylsis of identified peptides, please refer to the project's wiki page: https://github.com/ProGenNo/ProHap/wiki/Using-the-database-for-proteomic-searches. When using these databases in your publication, please cite: Vašíček, J., Kuznetsova, K.G., Skiadopoulou, D. et al. ProHap enables human proteomic database generation accounting for population diversity. Nat Methods (2024). https://doi.org/10.1038/s41592-024-02506-0
haplotypes, proteogenomics, protein database
haplotypes, proteogenomics, protein database
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 1 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
