
AbstractBackgroundSince the initial publication ofclusterMaker, the need for tools to analyze large biological datasets has only increased. New datasets are significantly larger than a decade ago, and new experimental techniques such as single-cell transcriptomics continue to drive the need for clustering or classification techniques to focus on portions of datasets of interest. While many libraries and packages exist that implement various algorithms, there remains the need for clustering packages that are easy to use, integrated with visualization of the results, and integrated with other commonly used tools for biological data analysis.clusterMaker2has added several new algorithms, including two entirely new categories of analyses: node ranking and dimensionality reduction. Furthermore, many of the new algorithms have been implemented using the CytoscapejobsAPI, which provides a mechanism for executing remote jobs from within Cytoscape. Together, these advances facilitate meaningful analyses of modern biological datasets despite their ever-increasing size and complexity.ResultsThe use ofclusterMaker2is exemplified by reanalyzing the yeast heat shock expression experiment that was included in our original paper; however, here we explored this dataset in significantly more detail. Combining this dataset with the yeast protein–protein interaction network from STRING, we were able to perform a variety of analyses and visualizations from withinclusterMaker2, including Leiden clustering to break the entire network into smaller clusters, hierarchical clustering to look at the overall expression dataset, dimensionality reduction using UMAP to find correlations between our hierarchical visualization and the UMAP plot, fuzzy clustering, and cluster ranking. Using these techniques, we were able to explore the highest-ranking cluster and determine that it represents a strong contender for proteins working together in response to heat shock. We found a series of clusters that, when re-explored as fuzzy clusters, provide a better presentation of mitochondrial processes.ConclusionsclusterMaker2represents a significant advance over the previously published version, and most importantly, provides an easy-to-use tool to perform clustering and to visualize clusters within the Cytoscape network context. The new algorithms should be welcome to the large population of Cytoscape users, particularly the new dimensionality reduction and fuzzy clustering techniques.
Bioinformatics, QH301-705.5, Computer applications to medicine. Medical informatics, Bioinformatics and Computational Biology, R858-859.7, Mathematical sciences, Bioengineering, Saccharomyces cerevisiae, Mathematical Sciences, Clustering, Information and Computing Sciences, Cluster Analysis, Protein Interaction Maps, Biology (General), Visualization, Community detection, Cytoscape, Biological Sciences, Mobile Applications, Biological sciences, Networking and Information Technology R&D (NITRD), Information and computing sciences, Network analysis, Generic health relevance, Software, Algorithms
Bioinformatics, QH301-705.5, Computer applications to medicine. Medical informatics, Bioinformatics and Computational Biology, R858-859.7, Mathematical sciences, Bioengineering, Saccharomyces cerevisiae, Mathematical Sciences, Clustering, Information and Computing Sciences, Cluster Analysis, Protein Interaction Maps, Biology (General), Visualization, Community detection, Cytoscape, Biological Sciences, Mobile Applications, Biological sciences, Networking and Information Technology R&D (NITRD), Information and computing sciences, Network analysis, Generic health relevance, Software, Algorithms
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 31 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Top 10% | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Top 10% |
