Powered by OpenAIRE graph
Found an issue? Give us feedback
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ ZENODOarrow_drop_down
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Dataset . 2025
License: CC 0
Data sources: ZENODO
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Dataset . 2024
License: CC 0
Data sources: ZENODO
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Dataset . 2024
License: CC 0
Data sources: ZENODO
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Dataset . 2024
License: CC 0
Data sources: ZENODO
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Dataset . 2025
License: CC 0
Data sources: ZENODO
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Dataset . 2024
License: CC 0
Data sources: ZENODO
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Dataset . 2025
License: CC 0
Data sources: ZENODO
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Dataset . 2024
License: CC 0
Data sources: ZENODO
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Dataset . 2024
License: CC 0
Data sources: ZENODO
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Dataset . 2024
License: CC 0
Data sources: ZENODO
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Dataset . 2025
License: CC 0
Data sources: ZENODO
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Dataset . 2024
License: CC 0
Data sources: ZENODO
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Dataset . 2025
License: CC 0
Data sources: Datacite
ZENODO
Dataset . 2025
License: CC 0
Data sources: Datacite
ZENODO
Dataset . 2024
License: CC 0
Data sources: Datacite
ZENODO
Dataset . 2024
License: CC 0
Data sources: Datacite
ZENODO
Dataset . 2024
License: CC 0
Data sources: Datacite
ZENODO
Dataset . 2025
License: CC 0
Data sources: Datacite
ZENODO
Dataset . 2024
License: CC 0
Data sources: Datacite
ZENODO
Dataset . 2025
License: CC 0
Data sources: Datacite
ZENODO
Dataset . 2024
License: CC 0
Data sources: Datacite
ZENODO
Dataset . 2024
License: CC 0
Data sources: Datacite
ZENODO
Dataset . 2024
License: CC 0
Data sources: Datacite
ZENODO
Dataset . 2025
License: CC 0
Data sources: Datacite
ZENODO
Dataset . 2024
License: CC 0
Data sources: Datacite
Data INRAE
Dataset . 2021
Data sources: B2FIND
versions View all 16 versions
addClaim

A catalog of genes and species of the human oral microbiota

Authors: Le Chatelier, Emmanuelle; Almeida, Mathieu; Plaza Oñate, Florian; Pons, Nicolas; Gauthier, Franck; Ghozlane, Amine; Ehrlich, Stanislav Dusko; +2 Authors

A catalog of genes and species of the human oral microbiota

Abstract

Dataset overview This dataset provides: a non-redundant high-quality catalog of 8.4 million genes 853 Metagenomic Species Pangenomes (MSPs) This dataset can be used to analyze shotgun sequencing data of the human oral microbiota. Methods Data Sources The oral gene catalog was built using three primary sources: Bacterial Genomes from the Human Oral Microbiome Database (HOMD). Fungal Genomes from the NCBI RefSeq database. Metagenomic Sequencing Data from multiple oral microbiome studies. The creation of the oral gene catalog was a multi-step process, combining and refining genes from each source Bacterial Genes A total of 1,505 bacterial genomes were downloaded from HOMD (version 20170215, accessed in December 2017). Genes shorter than 60 nucleotides or containing ambiguous bases were filtered out. Redundancy was removed using CD-HIT-EST (v4.6; parameters: -aS 0.9 -c 0.95 -T 0 -M 0 -t 0 -d 0 -G 0). This process yielded 1,459,394 unique HOMD genes for the catalog. Fungal Genes 1,017 fungal genomes were downloaded from NCBI RefSeq (May 2017). For the 492 genomes lacking existing annotations, gene calling was performed using Genemark-ES in fungi mode. After initial redundancy removal with CD-HIT-EST (v4.6; parameters: -aS 0.9 -c 0.95 -T 0 -M 0 -t 0 -d 0 -G 0), genes were selected for inclusion only if their corresponding genome was present in at least 20% of the samples in one of the metagenomic cohorts, determined by mapping reads with Bowtie2 (v2.2.3). This led to the selection of 2,440,644 fungal genes. Metagenomic Sequencing Data The gene catalog was supplemented with data from 689 oral metagenomes, including newly sequenced samples, from the following studies: Human Microbiome Project (HMP): 382 samples (bioproject PRJNA255439). Chinese Cohort: 212 samples (bioproject PRJEB6997). TwinsUK Cohort: 48 newly sequenced samples (bioproject: PRJEB38483). Raw reads were subjected to quality control and trimmed using AlienTrimmer 0.4.0 (parameters: -k 10 -l 45 -m 5 -p 40 -q 20). Human sequences were removed by mapping against the human reference genome (GRCh38.p11) using Bowtie2 2.2.3. Metagenomic assembly was performed using SPAdes 3.9.0 (parameters: “-k 21,33,55 --only-assembler –meta” for Illumina paired-end data, or “--iontorrent -t 24 -m 300 -k 21,33,55 --only-assembler” for Ion Torrent single-end data). Contigs shorter than 500 bp or with coverage less than 2x were discarded. Gene calling was conducted with Prodigal (parameters: -m -p meta). Genes shorter than 60 bp were filtered out, and redundancy was removed with CD-HIT-EST (v4.6; parameters: -aS 0.9 -c 0.95 -T 0 -M 0 -t 0 -d 0 -G 0). Final Gene Catalog The final gene catalog was assembled by sequentially adding non-redundant genes from each data source. Genes from HOMD and fungal genomes were combined first using cd-hit-est-2d. Then, non-redundant genes from the HMP, Chinese, and TwinsUK cohorts were sequentially added using cd-hit-est-2d (same parameters as cd-hit-est). A final redundancy removal step was performed. This process resulted in a catalogue of 8.4 million non-redundant genes MSPs Recovery The 689 metagenomic samples were aligned against the final gene catalog using the Meteor software suite to produce a gene abundance table. Then, co-abundant genes were binned into 853 Metagenomic Species Pan-genomes (MSPs) using MSPminer. MSPs Taxonomic Annotation Taxonomic annotation for the MSPs was performed by aligning all core and accessory genes against representative genomes from the GTDB database (release r214) using blastn (task: megablast, word_size: 16). A species-level assignment was given if over 50% of the genes matched a representative genome with a mean nucleotide identity of at least 95% and a mean gene length coverage of at least 90%. The remaining MSPs were assigned to a higher taxonomic level (genus to superkingdom) if more than 50% of their genes shared the same annotation.

Keywords

Human Health and Pathology, Health and Life Sciences, Microbial Ecology and Applied Microbiology, Medicine, Health and Life Sciences, Microorganisms, Medicine, Omics, Life Sciences, Biology, Pathology and Forensic Medicine

  • BIP!
    Impact byBIP!
    selected citations
    These citations are derived from selected sources.
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    7
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Top 10%
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Average
Powered by OpenAIRE graph
Found an issue? Give us feedback
selected citations
These citations are derived from selected sources.
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
7
Top 10%
Average
Average