Innuendo Whole Genome And Core Genome Mlst Schemas And Datasets For Campylobacter Jejuni

Authors: Mirko Rossi; Mickael Santos Da Silva; Bruno Filipe Ribeiro-Gonçalves; Diogo Nuno Silva; Miguel Paulo Machado; Mónica Oleastro; Vítor Borges; +17 Authors

doi: 10.5281/zenodo.1322563 , 10.5281/zenodo.1322564

Innuendo Whole Genome And Core Genome Mlst Schemas And Datasets For Campylobacter Jejuni

- Summary
- Metrics

Abstract

Dataset Raw reads deposited in the European Nucleotide Archive (ENA) or in the NCBI Sequence Read Archive (SRA) as C. jejuni were retrieved in April 2017. In total 5,691 genomes passed the INNUca v3.1 pipeline have been selected. Additionally, 566 raw reads previously published in Kovanen et al., 2016, Llarena et al., 2016, Kovanen et al., 2014, Kovanen et al., 2014 and Gacia-Sanchez et a., 2017 were included. The database also includes 269 C. jejuni belonging to the INNUENDO Sequence Dataset (PRJEB27020). Genomes were assembled using INNUca v3.1 pipeline and passed the QC. File 'Metadata/Cjejuni_metadata.txt' contains metadata information for each strain including country and year of isolation, source classification and taxa of the host, classical pubMLST 7 genes ST and CC classification. The directory 'Genomes' contains all the 6,526 INNUca V3.1 assemblies of the strains listed in 'Metadata/Cjejuni_metadata.txt'. Schema creation and validation Draft genome assemblies were annotated using Prokka and initial pangenome was defined using Roary. The chewBBACA CreateSchema.py was used for creating a whole genome schema starting from roary pangenome. The schema was initially composed by 5,447 loci and has been populated with the 6,526 C. jejuni genomes. The quality of the loci has been assessed using chewBBACA Schema Evaluation. Loci with single alleles and those with high length variability (i.e. if more than 1 allele is outside the mode +/- 0.05 size) have been removed. The wgMLST schema has been further curated, excluding all those loci detected as “Repeated Loci” and loci annotated as “non-informative paralogous hit (NIPH/ NIPHEM)” or “Allele Larger/ Smaller than length mode (ALM/ ASM)” by the chewBBACA Allele Calling engine in more than 1% of the C. jejuni genomes dataset. File 'Schema/Cjejuni_wgMLST_2795_schema.tar.gz' contains the wgMLST schema formatted for chewBBACA and includes a total of 2,795 loci. File 'Schema/Cjejuni_cgMLST_678_listGenes.txt' contains the list of genes from the wgMLST schema which defines the cgMLST schema. The cgMLST schema consists of 678 loci and has been defined as the loci present in at least the 99.9% of the 6,526 C. jejuni genomes. Genomes have no more than 2% of missing loci. File 'Allele_Profles/Cjejuni_wgMLST_alleleProfiles.tsv' contains the wgMLST allelic profile of the 6,526 C. jejuni genomes of the dataset. Please note that missing loci follow the annotation of chewBBACA Allele Calling software. File 'Allele_Profles/Cjejuni_cgMLST_alleleProfiles.tsv' contains the cgMLST allelic profile of the 6,526 C. jejuni genomes of the dataset. Please note that missing loci are indicated with a zero. Additional citations The schema are prepared to be used with chewBBACA. When using the schema in this repository please cite also Silva M, Machado M, Silva D, Rossi M, Moran-Gilad J, Santos S, Ramirez M, Carriço J. chewBBACA: A complete suite for gene-by-gene schema creation and strain identification. 15/03/2018. M Gen 4(3): doi:10.1099/mgen.0.000166 http://mgen.microbiologyresearch.org/content/journal/mgen/10.1099/mgen.0.000166

The isolates' genomes raw sequence data produced within the activity of the INNUENDO project were submitted to the European Nucleotide Archive (ENA) database and are publicly available under the project accession number PRJEB27020. When using the schemas, the assemblies or the allele profiles please include the project number in your publication. The research from the INNUENDO project has received funding from European Food Safety Authority (EFSA), grant agreement GP/EFSA/AFSCO/2015/01/CT2 (New approaches in identifying and characterizing microbial and chemical hazards) and from the Government of the Basque Country. The conclusions, findings, and opinions expressed in this repository reflect only the view of the INNUENDO consortium members and not the official position of EFSA nor of the Government of the Basque Country. EFSA and the Government of the Basque Country are not responsible for any use that may be made of the information included in this repository. The consortium thanks all the researchers and the authorities worldwide which are contributing by submitting the raw sequences of the bacterial strains in public repositories. The project was possible thanks to the support of CSC- Tieteen tietotekniikan keskus Oy (https://www.csc.fi/) and of INCD (http://www.incd.pt/, funded by FCT and FEDER under the project 22153-01/SAICT/2016) for providing access to cloud computing resources.

Related Organizations

University of Veterinary Medicine Vienna
Austria
National Institute of Health Dr. Ricardo Jorge
Portugal
Finnish Food Safety Authority Evira
Finland
University of Helsinki
Finland
University of Lisbon
Portugal

View all View all

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	8
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Top 10%
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Top 10%