
Database assembly, curation, and taxonomic annotation of genomes To create a reference database for classifying environmental sequences, we constructed a database of putatively diazotrophic genomes from the Genome Taxonomy Database (r214; Parks et al. 2022). Species-representative genomes containing any of nifH, nifD, or nifK were identified with AnnoTree (Mendler et al. 2019). Since a large number of genomes with nifH (or homologous genes) do not contain any other nif genes (Mise et al. 2021), any genomes without all three nifDHK genes were assumed to not be “true” diazotrophs and were discarded, leaving 2798 genomes (3.3% of GTDB representative genomes) with the full suite of nifHDK genes that were assumed to be capable of N2-fixation, i.e., the “DiazoTIME" database. To assess the metabolic capabilities of these diazotrophs, we used METABOLIC v4 (Zhou et al. 2022) to annotate the metabolic genes of each diazotroph genome. METABOLIC identifies key functional pathways by aggregating results from genome searches using Hidden Markov Models (HMMs) from KOFam (Kanehisa et al. 2023), TIGR (Li et al. 2021), and select custom models. These gene annotations were used to categorize genomes into broad metabolic categories, focused on energy production and carbon sources. References Kanehisa M, Furumichi M, Sato Y, Kawashima M, Ishiguro-Watanabe M. 2023. KEGG for taxonomy-based analysis of pathways and genomes. D1. Nucleic Acids Research 51:D587–D592. Li W, O’Neill KR, Haft DH, DiCuccio M, Chetvernin V, Badretdin A, Coulouris G, Chitsaz F, Derbyshire MK, Durkin AS, Gonzales NR, Gwadz M, Lanczycki CJ, Song JS, Thanki N, Wang J, Yamashita RA, Yang M, Zheng C, Marchler-Bauer A, Thibaud-Nissen F. 2021. RefSeq: expanding the Prokaryotic Genome Annotation Pipeline reach with protein family model curation. Nucleic Acids Research 49:D1020–D1028. Mendler K, Chen H, Parks DH, Lobb B, Hug LA, Doxey AC. 2019. AnnoTree: visualization and exploration of a functionally annotated microbial tree of life. Nucleic Acids Res. 47(9):4442-4448. doi: 10.1093/nar/gkz246. Mise K, Masuda Y, Senoo K, Itoh H. 2021. Undervalued pseudo-nifH sequences in public databases distort metagenomic insights into biological nitrogen fixers. mSphere 6, e00785-21. Parks DH, Chuvochina M, Rinke C, Mussig AJ, Chaumeil P-A, Hugenholtz P. 2022. GTDB: an ongoing census of bacterial and archaeal diversity through a phylogenetically consistent, rank normalized and complete genome-based taxonomy. Nucleic Acids Research 50:D785–D794. Zhou Z, Tran PQ, Breister AM, Liu Y, Kieft K, Cowley ES, Karaoz U, Anantharaman K. 2022. METABOLIC: high-throughput profiling of microbial genomes for functional traits, metabolism, biogeochemistry, and community-scale functional networks. 1. Microbiome 10:33.
Description File Name Genome metadata (accession number, taxonomy, metabolic prediction) DiazoTIME_GTDBr214_taxonomy_and_METABOLIC.xlsx List of genomes from GTDB r214 with all 3 nif genes (nifH, nifD, nifK) GTDB_r214_AnnoTree_genome_Nifs_N2fixation_potential.xlsx METABOLIC program output METABOLIC_raw_outputs.xlsx nifH, nifD, nifK nucleotide sequences gtdb_r214_nifHDK_with_tax.fna.zip nifH, nifD, nifK amino acid sequences gtdb_r214_nifHDK_with_tax.faa.zip Full genomes nucleotide sequences gtdb_diazotroph_genome_full_fnas.tar.gz Dictionary linking NCBI and GTDB accessions combined_gtdb_r214_genome_contigs_dict.txt
The Diazotroph Taxonomic Identity and MEtabolism (DiazoTIME) database contains annotated taxonomy and metabolic predictions for nifH-, nifD-, and nifK- containing genomes (2798 genomes) in the Genome Taxonomy Database (GTDB; r214; Parks et al. 2022). This database provides a useful reference for studies focused on diazotroph biodiversity, environmental distribution, and functional potential.
microorganism, diazotroph, biological nitrogen fixation, genome
microorganism, diazotroph, biological nitrogen fixation, genome
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
