
representative_2.3M_seq.csv contains representative proteins from structure-based clustering of Alphafold structure database. The ESM2-650M last layer embeddings of these proteins were used to train SAE and MotifAE. SAE_step_80000.pt and MotifAE_step_80000.pt are checkpoints at 80,000 steps of both models. SAE was trained with reconstruction loss and L1 norm, MotifAE was trained with an additional local similarity loss. 412pros_ddG_ML.csv contains the deep mutational scanning data of protein folding stability, which is use to train MotifAE-G. 1404_stability_associated_features.pt were selected features using MotifAE-G.
protein language model, sparse autoencoder
protein language model, sparse autoencoder
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
