Powered by OpenAIRE graph
Found an issue? Give us feedback
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ ZENODOarrow_drop_down
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Dataset . 2016
License: CC 0
Data sources: ZENODO
DRYAD
Dataset . 2016
License: CC 0
Data sources: Datacite
versions View all 2 versions
addClaim

Data from: Paralogs are revealed by proportion of heterozygotes and deviations in read ratios in genotyping by sequencing data from natural populations

Authors: McKinney, Garrett J.; Waples, Ryan K.; Seeb, Lisa W.; Seeb, James E.;

Data from: Paralogs are revealed by proportion of heterozygotes and deviations in read ratios in genotyping by sequencing data from natural populations

Abstract

HDplot R codeCode to run HDplot in R with generic format as input. Example input file is HDplot_R_genericInputHDplot.RHDplot_R_genericInputExample input file for HDplot R package. Each row is a locus entry, there are five columns of data associated with each locus. Column locus_ID contains the locus name. Sequence reads for each allele are given in columns depth_a and depth_b. Sequence reads for each allele are summed over heterozygous individuals for each locus. The num_hets column indicates the number of heterozygous individuals for each locus while the num_samples column contains the total number of individuals in the data set.HDplot_pythonPython code to run HDplot. This can take input directly from the Stacks program in the form of the .vcf output from Stacks. The file HDplot_python_exampleInput.vcf is included as an example of the required format. The file vcf_to_depth.py is necessary to run HDplot_python and must be included in the same directory.vcf_to_depthPython package called by HDplot_python.py that extracts sequence read counts from the .vcf format input.HDplot_python_exampleInputExample .vcf format input for the HDplot_python.py program.HDplot_simulationR code to simulate data for HDplot. This code also runs HDplot internally and produces plots showing expected distributions based on simulation parameters. Simulation parameters include the number of singleton, duplicate, and diverged duplicate loci, the total population size, the sampled population size, average read depth per locus, statistical distribution of reads per locus to sample from and distribution of reads per allele to sample from.Chinook_sequenceReadsChinook salmon dataset processed with HDplot in this manuscript. File is in the .vcf format output by the Stacks genotyping program.Barberry_sequenceReadsMountain Barberry dataset processed with HDplot in this manuscript. File is in the .vcf format output by the Stacks genotyping program.Parrotfish_sequenceReadsDusky Parrotfish dataset processed with HDplot in this manuscript. File is in the .vcf format output by the Stacks genotyping program.

Whole genome duplications have occurred in the recent ancestors of many plants, fish, and amphibians, resulting in a pervasiveness of paralogous loci and the potential for both disomic and tetrasomic inheritance in the same genome. Paralogs can be difficult to reliably genotype and are often excluded from genotyping-by-sequencing (GBS) analyses; however, removal requires paralogs to be identified which is difficult without a reference genome. We present a method for identifying paralogs in natural populations by combining two properties of duplicated loci: 1) the expected frequency of heterozygotes exceeds that for singleton loci, and 2) within heterozygotes, observed read ratios for each allele in GBS data will deviate from the 1:1 expected for singleton (diploid) loci. These deviations are often not apparent within individuals, particularly when sequence coverage is low; but, we postulated that summing allele reads for each locus over all heterozygous individuals in a population would provide sufficient power to detect deviations at those loci. We identified paralogous loci in three species: Chinook salmon (Oncorhynchus tshawytscha) which retains regions with ongoing residual tetrasomy on eight chromosome arms following a recent whole genome duplication, mountain barberry (Berberis alpina) which has a large proportion of paralogs that arose through an unknown mechanism, and dusky parrotfish (Scarus niger) which has largely re-diploidized following an ancient whole genome duplication. Importantly, this approach only requires the genotype and allele-specific read counts for each individual, information which is readily obtained from most GBS analysis pipelines.

Related Organizations
Keywords

natural populations, Berberis alpina, Oncorhynchus tshawytscha, genome duplication, genotyping-by-sequencing, Scarus niger, Chinook salmon, paralog

  • BIP!
    Impact byBIP!
    selected citations
    These citations are derived from selected sources.
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    2
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Average
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Average
    OpenAIRE UsageCounts
    Usage byUsageCounts
    visibility views 92
    download downloads 50
  • 92
    views
    50
    downloads
    Powered byOpenAIRE UsageCounts
Powered by OpenAIRE graph
Found an issue? Give us feedback
visibility
download
selected citations
These citations are derived from selected sources.
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
views
OpenAIRE UsageCountsViews provided by UsageCounts
downloads
OpenAIRE UsageCountsDownloads provided by UsageCounts
2
Average
Average
Average
92
50