MotifAE: Unsupervised Discovery of Functional Motifs from Protein Language Model

representative_2.3M_seq.csv contains representative proteins from structure-based clustering of Alphafold structure database. The ESM2-650M last layer embeddings of these proteins were used to train SAE and MotifAE. SAE_step_80000.pt and MotifAE_step_80000.pt are checkpoints at 80,000 steps of both models. SAE was trained with reconstruction loss and L1 norm, MotifAE was trained with an additional local similarity loss. 412pros_ddG_ML.csv contains the deep mutational scanning data of protein folding stability, which is use to train MotifAE-G. 1404_stability_associated_features.pt were selected features using MotifAE-G.

Related Organizations

King’s University
United States

Keywords

protein language model, sparse autoencoder

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

0

Average