
Abstract Motivation For machine learning to matter beyond intellectual curiosity, the models developed therefrom must be adopted within the greater scientific community. In this study, we developed an interpretable machine learning framework that allows identification of semantics from various datatypes. Our package can analyze and illuminate co-predictive mechanisms reflecting biological processes. Results We present R.ROSETTA, an R package for building and analyzing interpretable machine learning models. R.ROSETTA gathers combinatorial statistics via rule-based modelling for accessible and transparent results, well-suited for adoption within the greater scientific community. The package also provides statistics and visualization tools that facilitate minimization of analysis bias and noise. Investigating case-control studies of autism, we showed that our tool provided hypotheses for potential interdependencies among features that discerned phenotype classes. These interdependencies regarded neurodevelopmental and autism-related genes. Although our sample application of R.ROSETTA was used for transcriptomic data analysis, R.ROSETTA works perfectly with any decision-related omics data. Availability The R.ROSETTA package is freely available at https://github.com/komorowskilab/R.ROSETTA . Contact mateusz.garbulowski@icm.uu.se (Mateusz Garbulowski), jan.komorowski@icm.uu.se (Jan Komorowski)
570, Bioinformatics (Computational Biology), Interpretable machine learning, QH301-705.5, Computer applications to medicine. Medical informatics, R package, R858-859.7, Computational Biology, Rough sets, 004, Machine Learning, Big data, Case-Control Studies, Transcriptomics; Interpretable machine learning; Big data; Rough sets; Rule-based classification; R package, Bioinformatik (beräkningsbiologi), Data Mining, Biology (General), Transcriptomics, Software, Algorithms, Rule-based classification
570, Bioinformatics (Computational Biology), Interpretable machine learning, QH301-705.5, Computer applications to medicine. Medical informatics, R package, R858-859.7, Computational Biology, Rough sets, 004, Machine Learning, Big data, Case-Control Studies, Transcriptomics; Interpretable machine learning; Big data; Rough sets; Rule-based classification; R package, Bioinformatik (beräkningsbiologi), Data Mining, Biology (General), Transcriptomics, Software, Algorithms, Rule-based classification
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 22 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Top 10% | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Top 10% | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Top 10% |
