Identification of RNA Oligonucleotide and Protein Interactions Using Term Frequency Inverse Document Frequency and Random Forest

Eugene Uwiragiye; Kristen L. Rhinehardt

Found an issue? Give us feedback

https://doi.org/10.5...arrow_drop_down

https://doi.org/10.5772/intech...

Part of book or chapter of book . 2023 . Peer-reviewed

License: CC BY

Data sources: Crossref

Identification of RNA Oligonucleotide and Protein Interactions Using Term Frequency Inverse Document Frequency and Random Forest

descriptionPublicationkeyboard_double_arrow_right Part of book or chapter of book 29 Mar 2023 English Publisher:IntechOpen

Authors: Eugene Uwiragiye; Kristen L. Rhinehardt;

doi: 10.5772/intechopen.108819

Identification of RNA Oligonucleotide and Protein Interactions Using Term Frequency Inverse Document Frequency and Random Forest

- Summary
- Metrics

Abstract

The interaction between protein and Ribonucleic Acid (RNA) plays crucial roles in many biological aspects such as gene expression, posttranscriptional regulation, and protein synthesis. However, the experimental screening of protein-RNA binding affinity is laborious and time-consuming, there is a pressing desire of accurate and reliable computational approaches. In this study, we proposed a novel method to predict that interaction based on both sequences of protein and RNA. The Random Forest was trained and tested on a combination of benchmark datasets and the term frequency–inverse document frequency method combined with XgBoost algorithm was used to extract useful information from sequences. The performance of our method was very impressive, and the accuracy was as high as 94%, the Area Under the Curve of 0.98 and the Matthew Correlation Coefficient (MCC) of 0.90. All these high metrics, especially the MCC, show that our method is robust enough to keep its performance on unseen datasets.

Related Organizations

North Carolina Agricultural and Technical State University
United States

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

0

Average

hybrid