
Abstract Nucleotide-binding leucine-rich repeat receptors (NLRs) are critical components of plant immune systems, responsible for detecting pathogens and initiating defence responses. As part of our exploration of NLR protein diversity across a broad spectrum of plant species, we created a comprehensive NLRome dataset by analyzing 180 reference plant genomes from the NCBI RefSeq database (Pruitt et al. 2007). This database includes high-quality genome annotations for species from a wide phylogenetic range, encompassing algae, gymnosperms, early flowering plants, monocots, and dicots (https://www.ncbi.nlm.nih.gov/refseq/). Using NLRtracker, a specialized bioinformatics tool that integrates InterProScan for domain identification, we extracted and catalogued NLR proteins across these diverse genomes. Based on the NLR definition of RefPlantNLR and NLRtracker (Kourelis et al. 2021), 169 of the 180 species had at least 1 NLR predicted. In total, we catalogued 113,686 NLRs, ranging from 33 in Cucurbita maxima to 4155 in Quercus robur. In addition to NLR annotation, NLRtracker provided functional annotations for the entire proteome of each species enabling comparative genomics and evolutionary studies. NLRtracker output legend: File extension Description * _NLRtracker.tsv NLRtracker overview output with gene status. *_NLR.lst Identifier list of NLRs. *_NLR.gff3 NLR annotation of motifs, domains, and regions in GFF3 format. *_NLR.fasta NLR FASTA sequences. *_NLR-associated.lst Identifier list of NLR associated genes. *_NLR-associated.gff3 NLR associated genes annotation of motifs, domains, and regions in GFF3 format. *_NLR_associated.fasta NLR associated genes FASTA sequences. *_NBARC.fasta NB-ARC domain FASTA sequences. *_NBARC_deduplictated.fasta Deduplicated NB-ARC domain FASTA sequences. *_iTOL.txt Domain annotation file for iTOL. *_iTOL_dedup.txt Domain annotation file of the deduplicated sequences for iTOL. *_Domains.tsv Full-length and domain sequence and metadata for all NLRtracker output. interpro_result.gff InterProScan output of the query proteome. Supplementary Data Data S1. RefSeq species list and metadata. Data S2. Per genome sequence number statistics table for proteomes, total NLR, and putative NLR types determined by NLRtracker.
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 2 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Top 10% | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
