
Changes These changes were made between the v1 and the v2: SAG: contigs have been renamed and only those greater than 2.5kb were kept gene prediction was done with GeneMark-EP+ genes are available in gff3, transcripts and proteins as Fasta sequences The DNA-dependent RNA polymerase phylogenomic tree was updated with two SAGs from this dataset Mapping public metagenomes rows corresponding to metagenomes that had less than 100 reads mapped were removed from the table column removed: "used_for_UMAP", with the removal of some rows, this column contained only 'yes' values columns added: "clean_bases" "Nb_MAG_detected_sup_10pct" "cluster" A metadata table for MAGs and SAGs has been added, it contains the assemblies statistics, BUSCO results and taxonomy affiliation Archive content This Zenodo records contains the following items : Single Amplified Genomes (SAGs) The files present in the Zip archive SAG.zip are: - genome/*.fa.gz: SAG genomes- gene_prediction/.gff3.gz: gene predictions- gene_prediction/.transcript.fa.gz: transcript sequences- gene_prediction/.protein.fa.gz : protein sequences A summary for each SAG is available in the spreadsheet MAG_SAG_metadata.ods. Metagenome-Assembled Genomes (MAGs) The files present in the Zip archive MAG.zip are: euk_mag/*.fa.gz: MAG genomes gene_prediction_final/..gff3.gz: gene predictions gene_prediction_final/..transcript.fa.gz: transcript sequences gene_prediction_final/..prot.fa.gz: protein sequences Some MAG for which the gene prediction failed (too short sequence, other reason) and that were not used in the analysis with the mapping of public metagenomes, are available in the directory not_used/. A summary for each MAG is available in the spreadsheet MAG_SAG_metadata.ods. Metagenomic assemblies and bins The files present in the Zip archive metagenomic_assemblies.zip are: contigs_db/: the 4 anvi'o contigs databases that contain the contigs sequences merged_profiles_db: the 4 anvi'o profiles databases that contain the mapping results. The bins (MetaBAT2) are stored in these databases README_metagenomic_assemblies.md: a text file with more details WARNING: files have been compressed in .bz2, they must be decompressed (bunzip2) before usage. They use about 110 GB of disk-space. Even though these artifacts were generated for a previous version of anvi'o, a script is available to continue using them with an up-to-date installation, anvi-migrate (documentation). Phylogenetic tree All files are provided in the Zip archive phylogenetic_tree.zip. The DNA-dependent RNA polymerase The files present in the directory phylogenetic_tree/DNA_dependent_RNA_pol are: 00_hmm_profiles/ : the HMM profiles that target the DNA-dependent RNA polymerase sub-units 01_sequences/ : best hit for each sub-unit 02_alignments_raw/: raw alignment, by MAFFT v7.526 03_alignments_cleaned/: alignment after goalign clean sites -c 0.5 merge_alignments.py: Python script to concatenate the 6 sub-units RNAP_aln_v6_concat.fa: the concatenated alignment 04_tree/: the run of IQ-Tree v2.4.0 metadata_RNAP_tree.tsv: Metadata to decorate the tree The Python script merge_alignments.py requires Python version 3 and BioPython to work. The file metadata_RNAP_tree.tsv lists the reference genomes. The "source" column corresponds to: Mendota: Krinos et al.,2024, Microbiome. MAG are available at https://osf.io/9epa8/?view_only=152af26e11894ac0bcdfe542e02c6ab1 public_database : EBI / NCBI / DDBJ METDB : https://metdb.sb-roscoff.fr/metdb/ . DNA-dependent RNA polymerase sequences are available at https://www.genoscope.cns.fr/tara/ section Curated DNA-dependent RNA polymerase. Tara : Delmont et al., 2022, Cell Genomics. DNA-dependent RNA polymerase sequences are available at https://www.genoscope.cns.fr/tara/ section Curated DNA-dependent RNA polymerase. Phylosift tree The files present in the directory phylogenetic_tree/phylosift_tree/ are : 01_marker_present/: marker identified and aligned by Phylosift 02_marker_selected/: marker selected for the tree, markers present in at least 50% of the genomes 03_marker_alignment_cleaned/: alignment cleaned by Trimal v1.5 and parameter -automated1 04_phylosift_concatenated_alignment.fa: concatenation of the 50 markers 05_tree/: phylogeny built by IQTree v2.2.3 metadata_phylosift_tree.csv: list of reference genomes and they taxonomy. This file can be directly used to decorate the tree visualised with TreeViewer. Taxonomy affiliation The file phylogenetic_tree/taxonomy_affiliation_MAG_SAG.tsv summarises the taxonomic affiliation proposed for MAGs and SAGs for which markers were present in a sufficient number. The final taxonomic affiliation is also availabe in the spreadsheet MAG_SAG_metadata.ods. Mapping public metagenomes The files present in the Zip archive mapping_public_metagenomes.zip are : public_metagenomes_metadata.tsv : the metagenomes metadata table. The columns corresponds to : accession : Identifier in the public databases, except for datasets of the project origin : where the metagenome was collected origin_simplified : simplified version, as some names were long country : country in which the sample was taken broad_geo_region : the UN geoschemes code corresponding to the country (https://en.wikipedia.org/wiki/United_Nations_geoscheme) dataset : this project or public data DLATITUDE : latitude, in decimal degree DLONGITUDE : longitude in decimal degree salinity : relation of the sample to the salinity ECOSYSTEM.TYPE : type of ecosystem sampled MINIMUM.SIZE.FRACTION : when available, the pore size on which the genetic material was collected MAXIMUM.SIZE.FRACTION : when available, the pore size used to prefilter the sample SAMPLE.MATERIAL : nature of the sample, mostly water clean_reads : number of reads used for the mapping mapped_reads : number of reads that mapped on the MAGs and SAGs of this project filtered_reads : number of reads that passed the filters from msamtools Nb_MAG_detected_sup_10pct : number of MAG and SAG that were detected at more than 10 % (breadth of coverage) in the metagenome cluster : the cluster the metagenome belongs to The table that summarise the read count is public_metagenomes_read_count_on_MAGs_and_SAGs.tsv. The first column refers to the MAG and SAG identifiers, and the other 3097 columns represent one public metagenome each. The data present in this file is the number of read mapped per MAG/SAG per public metagenome, after filtering the mapping with msamtools v1.1.0 and the parameters filter -b -l 50 -p 95 -z 80. The file public_metagenomes_detection_of_MAGs_and_SAGs.tsv, summarises the breadth of coverage, in percent, of each MAG/SAG in each public metagenome. The value of "100" means that all positions of a particular genome is covered by at least one read from the given metagenome. And "0" means that no read from the metagenome X had mapped on the MAG/SAG Y. Unigenes The files stored in the Zip archive unigenes.zip are : unigenes_sequences.fa.gz : unigenes sequences, clean from contamination (human, metazoans, bacteria, archaea and viruses) table_readCount.noHuman.noConta.noMetazoa.annot.tsv.gz: counts of mapped reads on the unigenes plus functionnal annotations KEGG K0, Pfam and GO (derived from Pfam) table_taxonomy.perUnigene.allUnigenes.tsv.gz: unigenes taxonomic annotation See also the work of Monjot et al.,2023
Freshwater ecosystem, Metagenome, Environment, Lake
Freshwater ecosystem, Metagenome, Environment, Lake
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 1 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
