<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=undefined&type=result"></script>');
-->
</script>
Data sourcesThis dataset was constructed using metagenomic sequencing data from the bioproject PRJEB20308 from Coelho et al. 2018 (129 samples)Metagenomic assemblyFirst, sequencing adapters removal and read trimming was performed with fastp. Reads mapped on the host genome (ROS_Cfam_1.0 GCF_014441545.1) with bowtie2 were removed with samtools. Finally, Metagenomic assembly was performed with metaSPAdes. Contigs of less than 1500 bp were removed.MAGs recoveryMAGs were generated with COMEBin (multi-coverage mode) and MAGs quality was assessed with CheckM2. MAGs with completeness < 70% or contamination > 5% or N50 < 5Kb were discarded. Pairwise Average Nucleotide Identity (ANI) was computed for all recovered MAGs with fastANI and dereplication at species level (ANI cutoff = 95%).Non-redundant gene catalogGenes were predicted on all contigs from metagenomic assemblies with Prodigal (parameters : -m -p meta). Genes were pooled and clustered with cd-hit-est (parameters -c 0.95 -aS 0.90 -G 0 -d 0 -M 0 -T 0) by choosing those from the longest contigs as representatives.MSPs recoveryReads were aligned against the non-redundant gene catalog with the Meteor software suite to produce a raw gene abundance table (1,0M genes quantified in 129 samples). Then, co-abundant genes were binned in 234 Metagenomic Species Pan-genomes (MSPs, i.e. gene clusters that likely belong to the same microbial species) using MSPminer.MAGs and MSPs taxonomic annotationDereplicated MAGs were annotated with GTDB-Tk based on GTDB r220. Then, MAGs taxonomic annotation was propagated to the corresponding MSPs.Construction of the phylogenetic tree39 universal phylogenetic markers genes were extracted from the dereplicated MAGs with fetchMGs. Then, the markers were separately aligned with MUSCLE. The 40 alignments were merged and trimmed with trimAl (parameters: -automated1). Finally, the phylogenetic tree was computed with FastTreeMP (parameters: -gamma -pseudo -spr -mlacc 3 -slownni).Mapping rate distribution across public cohortsWe generated mapping rate distribution plots using Meteor2 (default parameters), comparing performance between: PRJEB20308 (cohort used in catalogue assembly) and PRNJNA714112 (independent cohort not used in assembly).