
pmid: 19233962
The concept of genome signature allows sequence comparisons without alignment. It relies on the premise that oligonucleotide compositions of DNA segments from the same or closely related genomes tend to be more similar than those from distantly related genomes. This concept has been used in detection of lateral gene transfer, phylogenetic classification of metagenome sequences (binning), and in studies of evolution of viruses and plasmids. The goal of this work is to explore limitations of genome signature in phylogenetic classification of DNA sequences and to identify formal representations of genome signature that expose best the phylogenetic relationships among prokaryotes. We found that genome signatures that best represent phylogenetic relationships are those normalized to factor out differences in G + C content and utilizing the standard A-C-G-T alphabet or the degenerate R-Y (purine-pyrimidine) alphabet. The main limitation of all genome signature representations tested is lack of divergence among some distantly related species. "Crowding" of the genome signature space and absence of molecular clock likely contribute to this phenomenon. We introduce "periodicity signatures"--formal representations of periodic sequence patterns related to DNA curvature--which can discriminate between bacterial and archaeal DNA sequences. Interestingly, archaea of the order Halobacteriaceae have periodic signatures similar to bacteria, possibly due to their early divergence from other archaea, extensive lateral gene transfer, or due to their adaptation to high salt environments. Our results have practical implications for development and application of genome signature-based methods for analysis and classification of DNA sequences.
Base Composition, Principal Component Analysis, Genome, Time Factors, Base Sequence, DNA, Chromosomes, Prokaryotic Cells, Base Pairing, Phylogeny
Base Composition, Principal Component Analysis, Genome, Time Factors, Base Sequence, DNA, Chromosomes, Prokaryotic Cells, Base Pairing, Phylogeny
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 49 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Top 10% | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Top 10% | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Top 10% |
