
This dataset is version 2 of zenodo.org/records/14080821 for the paper titled "Pool PaRTI: A PageRank-Based Pooling Method for Identifying Critical Residues and Enhancing Protein Sequence Representations." For two different PLMs (ESM-2 650M and protBERT) and more than 20,000 proteins on UniProt (encapsulating all Homo sapiens proteins), we present 1) the protein sequence embeddings generated by Pool PaRTI 2) the importance weights assigned to each residue of every protein by Pool PaRTI in the npz files. The individual proteins are indexed by their UniProt accession codes. If you need to generate sequence embeddings or get residue importance values for sequences not in the dataset, please follow the repo with the link below to generate the desired output. github.com/Helix-Research-Lab/Pool_PaRTI.git You can also reach out to the authors for any clarification.
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
