
Ribonucleic acid (RNA) plays a variety of crucial roles in fundamental biological processes. Recently, RNA has become an interesting drug target, emphasizing the need to improve our understanding of its structures and functions. Over the years, sequencing technologies have produced an enormous amount of unlabeled RNA data, which hides important knowledge and potential. Motivated by the successes of protein language models, we introduce RiboNucleic Acid Language Model (RiNALMo) to help unveil the hidden code of RNA. RiNALMo is the largest RNA language model to date with 650 million parameters pre-trained on 36 million non-coding RNA sequences from several available databases. RiNALMo is able to extract hidden knowledge and capture the underlying structure information implicitly embedded within the RNA sequences. RiNALMo achieves state-of-the-art results on several downstream tasks. Notably, we show that its generalization capabilities can overcome the inability of other deep learning methods for secondary structure prediction to generalize on unseen RNA families.
Weights for RiNALMo model used in "RiNALMo: General-Purpose RNA Language Models Can Generalize Well on Structure Prediction Tasks".Files: rinalmo__pretrained.pt Weights of pretrained RiNALMo (MLM pretraining) Configurations: micro (33M), mega (150M), giga (650M) rinalmo_giga_ss__ft.pt Weights of RiNALMo fine-tuned for secondary structure prediction (with prediction head) rinalmo_giga_mrl_ft.pt Weights of RiNALMo fine-tuned for mean ribosome load prediction (with prediction head) rinalmo_giga_splice__ft.pt Weights of RiNALMo fine-tuned for splice site prediction (with prediction head) rinalmo_giga_ncrna_class__noise_ft.pt Weights of RiNALMo fine-tuned for ncRNA classification (with prediction head)
Deep Learning, Foundation Model, Structural Biology, RNA, Language Model, Biology
Deep Learning, Foundation Model, Structural Biology, RNA, Language Model, Biology
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
