Downloads provided by UsageCounts
Residue and sequence embeddings of the human proteome (SwissProt for organism Human, downloaded on 2021.06.09) computed using bio_embeddings (bioembeddings.com) using the ProtT5 embedder at full precision (https://www.biorxiv.org/content/10.1101/2020.07.12.199554v3). Additionally: - Sequence-level predictions of subcellular localization in 10 classes using LA (https://www.biorxiv.org/content/10.1101/2021.04.25.441334v1) - Residue-level three state secondary structure prediction (alpha, sheet or other) using models reported in the ProtTrans paper (https://www.biorxiv.org/content/10.1101/2020.07.12.199554v3) Files included: - human.fasta --> FASTA-formatted sequences of human from SwissProt - DSSP3_human_ProtT5Sec.fasta --> Secondary structure predictions in three states for each residue of each protein in human.fasta. "H" stands for Helix; "E" stands for Sheet; "C" stands for Other. - subcell_human_LA_ProtT5.csv --> Subcellular location (10 states) and memrane-boundness (2 states) for each protein in human.fasta - embeddings_file.h5 --> per-residue embeddings of sequences in human.fasta. Each dataset in the .h5 file represents a protein sequence and contains a matrix of length Lx1024, with L being the length of the protein sequence. Datasets are indexed using integers. The original sequence identifier (from the FASTA header) can be accessed through the "original_id" attribute. See https://docs.bioembeddings.com/v0.2.0/notebooks/open_embedding_file.html for information on how to open the file - reduced_embeddings_file.h5 --> per-sequence embeddings of sequences in human.fasta (obtained by mean-pooling the residue-embeddings along the length dimension of the protein sequence). Each dataset in the .h5 file represents a protein sequence and contains a vector of size 1024 (meaning, each sequence has the same dimension).
protein embeddings, protein secondary structure, protein subcellular location, protein prediction, protein language models, human proteome
protein embeddings, protein secondary structure, protein subcellular location, protein prediction, protein language models, human proteome
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
| views | 122 | |
| downloads | 131 |

Views provided by UsageCounts
Downloads provided by UsageCounts