<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=undefined&type=result"></script>');
-->
</script>

COPY SCRIPT

For further information contact us at helpdesk@openaire.eu

Multiple Sequence Alignment of a diverse dataset with 1788 Mycobacterium tuberculosis isolates

Name: Multiple Sequence Alignment of a diverse dataset with 1788 Mycobacterium tuberculosis isolates
Keywords: 3. Good health

Research datakeyboard_double_arrow_right Dataset 27 Mar 2023Publisher:ZenodoFunded by:EC | One Health EJP

Authors: Mixão, Verónica; Pinto, Miguel; Sobral, Daniel; Di Pasquale, Adriano; Gomes, João Paulo; Borges, Vítor;

doi: 10.5281/zenodo.7772652 , 10.5281/zenodo.7772651

Multiple Sequence Alignment of a diverse dataset with 1788 Mycobacterium tuberculosis isolates

- Summary
- Metrics

Abstract

Multiple Sequence Alignment of a diverse dataset with 1788 Mycobacterium tuberculosis isolates used for ReporTree benchmarking The dataset comprises whole-genome sequence data published by Walker et al. 2015. For the multiple sequence analysis, we proceeded as follows: Reads were downloaded from ENA BioProject PRJNA282721 (accessed on March 16th, 2023) and trimmed using Trimmomatic (Bolger et al., 2014) with INNUca default settings; Quality-processed reads were individually mapped against the H37Rv reference genome (Genbank accession: NC_000962.3) using Snippy v4.5.1 and SNP-calling was performed on variant sites with the following criteria: a minimum proportion of reads differing from the reference of 70%, a minimum mapping quality of 30 and a minimum coverage for SNP calling of 10; A full alignment was extracted using Snippy’s core module (snippy-core), with masking of SNPs falling within known M. tuberculosis genomic regions with high GC content, repetitive elements and resistance-associated positions (corresponding to ~8% of the genome), as previously described for surveillance purposes (Macedo et al., 2019); M. tuberculosis lineages were determined using tb-profiler v4.4.1 (Phelan et al., 2019), with samples from the M. tuberculosis complex other than M. tuberculosis, representing a mix of multiple lineages, or with less than 95% of mapped positions in the reference, being excluded; A filtered alignment comprising the maximum number of informative sites (88,562 nucleotide sites with at least one mutation in a given sequence) was extracted from the full alignment using the alignment_processing.py v1.1.0 (default settings) of ReporTree, and then used as input for the benchmarking. In this repository, we provide two alignment files: Core_MTB_1787_strs.full.aln: this corresponds to the full multiple sequence alignment comprising 1787 samples and the reference (corresponding to the point 4 of the methodology). MTb_original_align_profile.fasta: this corresponds to the multiple sequence alignment comprising 1787 samples and the reference and only presenting the alignment informative sites (corresponding to the point 5 of the methodology)

Impact byBIP!

	citations This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average