
This dataset contains the outputs of the AlphaFold model for 4,581 proteins that are relevant targets in drug discovery. More information on the dataset can be found at the following repository: Dataset structure: ↓ data/* -> main data directory ↓ data/PID/* -> data of a single protein of length L Filename Description Tensor shape Lightweight single.npy ( s i ) evoformer single representation [L x 384] ✔️ structure.npy ( a i ) output of the last layer of structure module [L x 384] ✔️ msa.npy*** ( m s i ) processed MSA representation [N x L x 256] pair.npy*** ( z i j ) evoformer pair representation [L x L x 128] PID.pdb 3D protein structure prediction ✔️ PID_unrelaxed.pdb 3D protein structure prediction w/o relaxation step (D) ✔️ confidence.npy* confidence in structure prediction (0-100) 1 ✔️ plldt.npy* confidence in structure prediction per residue [L] ✔️ PID.fasta protein amino acid sequence and metadata ✔️ timings.json Processing log ✔️ ↓ data/PID2/* -> data of protein #2 ... *Note: L: sequence length, N: number of aligned sequences via MSA.
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
