Downloads provided by UsageCounts
Background: In most flowering plants, the plastid genome exhibits a quadripartite genome structure, comprising a large and a small single copy as well as two inverted repeat regions. Thousands of plastid genomes have been sequenced and submitted to public sequence repositories in recent years. The quality of sequence annotations in many of these submissions is known to be problematic, especially regarding annotations that specify the length and location of the inverted repeats: such annotations are either missing or portray the length or location of the repeats incorrectly. Many biological investigations employ publicly available plastid genomes at face value and implicitly assume the correctness of their sequence annotations. Results: We developed a software tool, titled airpg, that automatically assesses the frequency of incomplete or incorrect annotations of the inverted repeats among publicly available plastid genomes. Specifically, the tool automatically retrieves complete plastid genomes from NCBI Nucleotide under variable search parameters, surveys their annotations, and parses information on the length and position of their inverted repeat regions. The package also includes functionality for the automatic identification and removal of duplicate genome records and accounts for taxa that genuinely lack inverted repeats. Conclusions: A survey of the presence of inverted repeat annotations among all plastid genomes of flowering plants stored on NCBI Nucleotide using airpg, followed by a statistical analysis of potential associations with record metadata, highlighted that release year and publication status of the genome records have a significant effect on the frequency of complete and correct repeat annotations.
sequence annotations, software, plastid genomes, inverted repeats, data mining, NCBI Nucleotide, Python
sequence annotations, software, plastid genomes, inverted repeats, data mining, NCBI Nucleotide, Python
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
| views | 5 | |
| downloads | 4 |

Views provided by UsageCounts
Downloads provided by UsageCounts