
Abstract Background Retrotransposons are an abundant component of eukaryotic genomes. The high quality of the Arabidopsis thaliana genome sequence makes it possible to comprehensively characterize retroelement populations and explore factors that contribute to their genomic distribution. Results We identified the full complement of A. thaliana long terminal repeat (LTR) retroelements using RetroMap, a software tool that iteratively searches genome sequences for reverse transcriptases and then defines retroelement insertions. Relative ages of full-length elements were estimated by assessing sequence divergence between LTRs: the Pseudoviridae were significantly younger than the Metaviridae. All retroelement insertions were mapped onto the genome sequence and their distribution was distinctly non-uniform. Although both Pseudoviridae and Metaviridae tend to cluster within pericentromeric heterochromatin, this association is significantly more pronounced for all three Metaviridae sublineages ( Metavirus , Tat and Athila ). Among these, Tat and Athila are strictly associated with pericentromeric heterochromatin. Conclusions The non-uniform genomic distribution of the Pseudoviridae and the Metaviridae can be explained by a variety of factors including target-site bias, selection against integration into euchromatin and pericentromeric accumulation of elements as a result of suppression of recombination. However, comparisons based on the age of elements and their chromosomal location indicate that integration-site specificity is likely to be the primary factor determining distribution of the Athila and Tat sublineages of the Metaviridae. We predict that, like retroelements in yeast, the Athila and Tat elements target integration to pericentromeric regions by recognizing a specific feature of pericentromeric heterochromatin.
570, Time Factors, Models, Genetic, Retroelements, Applied Statistics, Statistical Models, Research, Virus Integration, Arabidopsis, Terminal Repeat Sequences, Computational Biology, Genetics and Genomics, Genomics, Chromosomes, Plant, Plant Viruses, Cell and Developmental Biology, Mutagenesis, Insertional, Gene Targeting
570, Time Factors, Models, Genetic, Retroelements, Applied Statistics, Statistical Models, Research, Virus Integration, Arabidopsis, Terminal Repeat Sequences, Computational Biology, Genetics and Genomics, Genomics, Chromosomes, Plant, Plant Viruses, Cell and Developmental Biology, Mutagenesis, Insertional, Gene Targeting
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 68 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Top 10% | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Top 10% | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Top 10% |
