
<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=undefined&type=result"></script>');
-->
</script>
Multiple Sequence Alignment of a diverse dataset with 1788 Mycobacterium tuberculosis isolates used for ReporTree benchmarking The dataset comprises whole-genome sequence data published by Walker et al. 2015. For the multiple sequence analysis, we proceeded as follows: Reads were downloaded from ENA BioProject PRJNA282721 (accessed on March 16th, 2023) and trimmed using Trimmomatic (Bolger et al., 2014) with INNUca default settings; Quality-processed reads were individually mapped against the H37Rv reference genome (Genbank accession: NC_000962.3) using Snippy v4.5.1 and SNP-calling was performed on variant sites with the following criteria: a minimum proportion of reads differing from the reference of 70%, a minimum mapping quality of 30 and a minimum coverage for SNP calling of 10; A full alignment was extracted using Snippy’s core module (snippy-core), with masking of SNPs falling within known M. tuberculosis genomic regions with high GC content, repetitive elements and resistance-associated positions (corresponding to ~8% of the genome), as previously described for surveillance purposes (Macedo et al., 2019); M. tuberculosis lineages were determined using tb-profiler v4.4.1 (Phelan et al., 2019), with samples from the M. tuberculosis complex other than M. tuberculosis, representing a mix of multiple lineages, or with less than 95% of mapped positions in the reference, being excluded; A filtered alignment comprising the maximum number of informative sites (88,562 nucleotide sites with at least one mutation in a given sequence) was extracted from the full alignment using the alignment_processing.py v1.1.0 (default settings) of ReporTree, and then used as input for the benchmarking. In this repository, we provide two alignment files: Core_MTB_1787_strs.full.aln: this corresponds to the full multiple sequence alignment comprising 1787 samples and the reference (corresponding to the point 4 of the methodology). MTb_original_align_profile.fasta: this corresponds to the multiple sequence alignment comprising 1787 samples and the reference and only presenting the alignment informative sites (corresponding to the point 5 of the methodology)
citations This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
views | 36 | |
downloads | 9 |