Views provided by UsageCounts
ReClustOR is a novel clustering method that overcomes some of the problems associated with classical ‘heuristic’clustering methods and consequently increases the stability and quality of the reconstructed OTUs. Moreover, the OTUs database defined with ReClustOR can be used as reference(s) with gradual enrichment of it, with new studies and samples. In this way, huge datasets like the Earth Microbiome Project can be easily used as references for smaller projects, thereby increasing the quality of comparisons between studies and datasets Here, we propose a new approach called ReClustOR (for RE-CLUSTering method using an Open-Reference approach) to improve OTU consistency (see https://doi.org/10.5281/zenodo.2597402). This new strategy combines two of the previously-described clustering methods. Firstly, a classical clustering method (e.g. SWARM, or VSEARCH) is used to define OTU centroids and create a reference database. Secondly, a closed- or open-reference method (depending on the user’s choice) is computed for all reads which are not considered as OTU centroids. Contrary to the classical clustering methods, each read is compared to all centroids using a distance-based greedy clustering technique (Edgar, 2010; He et al., 2015), and then assigned to the nearest one, thereby fixing the erroneous assignments of reads to OTUs. To highlight the improvements provided by ReClustOR in describing microbial diversity in terms of ecological diversity metrics (e.g. richness, OTU composition, Shannon, 1/Simpson) and taxonomic composition, a simulated dataset was subjected to: (i) ESV definition, (ii) multiple conventional de novo methods (i.e. a homemade de novo clustering close to CRUNCHCLUST, VSEARCH and SWARM), and (iii) ReClustOR computation. This dataset is a simulated one (Almeida et al., 2018), containing a diverse set of genera commonly found in three ecosystems different ecosystems: human gut, ocean and soil. The clustering methods were compared for: (i) their ability to describe microbial richness, (ii) the congruence between OTU assignments and sequences taxonomy, (iii) the robustness of each defined OTU, and (iv) their ability to efficiently describe the microbial community based on OTU composition. Here, the simulated dataset (00_Raw_data) and all steps of analysis are available to resue them to test ReClustOR, and also to have a better understanding of files and data produced by this program. More details are available in the Tree_of_data.tree file.
{"references": ["Alexandre Almeida, Alex L Mitchell, Aleksandra Tarkowska, Robert D Finn, Benchmarking taxonomic assignments based on 16S rRNA gene profiling of the microbiota from commonly sampled environments, GigaScience, Volume 7, Issue 5, May 2018, giy054, https://doi.org/10.1093/gigascience/giy054"]}
Simualted dataset, ReClustOR, VSEARCH, SWARM, Clustering, OTU
Simualted dataset, ReClustOR, VSEARCH, SWARM, Clustering, OTU
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
| views | 6 |

Views provided by UsageCounts