Powered by OpenAIRE graph
Found an issue? Give us feedback
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ Recolector de Cienci...arrow_drop_down
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
versions View all 1 versions
addClaim

Unmasking new intra-species diversity through K-mer count analysis

Authors: Pérez Cantalapiedra, Carlos; Contreras-Moreira, Bruno; Casas Cendoya, Ana María; Igartua Arregui, Ernesto;

Unmasking new intra-species diversity through K-mer count analysis

Abstract

High-throughput sequencing is often used to examine intra-species diversity. Most studies are focused on calling and genotyping SNPs. Other kinds of genomic variation, such as copy-number variation (CNV), are more rarely exploited despite literature reports linking them to phenotypic differences. For some loci, it is difficult to identify reliable SNPs. For instance, reads from closely related sequences (e.g. paralog genes) will often map stacked to the same location if some of those loci are absent from the reference sequence. Such piled up mappings produce abundant fake heterozygous SNPs, and thus have been called apparent heterozygous mappings (AHMs). To avoid wrong conclusions from false positive calls, SNPs from AHMs are often discarded, either in early (e.g. samples expected to be homozygous), or in downstream steps of the analysis (e.g. when incoherent haplotype blocks are identified). This would lead to information loss at certain loci. AHMs can be seen as a kind of CNV which is specific to non-identical copies. Unmasking such variation could help to i) assess the completeness of a genome or pan-genome reference, ii) confirm results from other CNV genotyping methods, when the copies originate in non-identical loci, iii) provide hints about the history and behavior of duplicating DNA loci, and iv) reveal novel intra-species genetic diversity. Here we present a software pipeline, kmeleon, available at https://github.com/eead-csic-compbio/kmeleon, designed to identify regions harboring AHMs. kmeleon is based on mappings, and thus it can be used for both homozygous and heterozygous samples. First, the different k-mers (sequences of length k) mapping to a single locus are identified and counted. Then, loci are classified based on the presence or absence of AHMs. From those intervals, it is straightforward to perform comparisons between genotypes, or to translate existing annotation to the regions with AHMs. We used exome capture data to detect AHMs in a set of barley accessions. We included the cultivar Morex, the genotype of the genome reference, as a control sample. As expected, it had the lowest number of AHMs, although some were still detectable. For all accessions, AHMs were found both in inter- and intragenic loci. Enrichment analysis showed that NBS-LRR proteins were overrepresented at AHMs, whereas PPRs proteins were depleted. Also, we will show that AHMs can be used to infer phylogenetic trees which are congruent to those produced with SNP-based approaches, supporting the information value, of this hidden variability, to describe genetic relationships.

1 .pdf copy (3 Figs.) from the original poster of the Authors. Creative Commons License Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0)

This study was financially supported by the Spanish Ministry of Economy, Industry and Competitiveness (Projects AGL2013-48756-R and AGL2016-80967-R).

Peer reviewed

Keywords

Exome Capture, Genotyping, Gene families, Presence-Absence Variation, K-mer Analysis, Pangenomics, Barley, Gene Families, Sequencing Plant Genomics, Copy Number Variations (CNV), NBS-LRR, Pentotricopeptide

  • BIP!
    Impact byBIP!
    selected citations
    These citations are derived from selected sources.
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    0
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Average
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Average
Powered by OpenAIRE graph
Found an issue? Give us feedback
selected citations
These citations are derived from selected sources.
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
0
Average
Average
Average
Green