Powered by OpenAIRE graph
Found an issue? Give us feedback
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ Genome Researcharrow_drop_down
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
Genome Research
Article
License: CC BY NC
Data sources: UnpayWall
Genome Research
Article . 1999 . Peer-reviewed
Data sources: Crossref
Genome Research
Article . 1999
versions View all 2 versions
addClaim

Analysis of Sequence-Tagged-Connector Strategies for DNA Sequencing

Authors: A F, Siegel; B, Trask; J C, Roach; G G, Mahairas; L, Hood; G, van den Engh;

Analysis of Sequence-Tagged-Connector Strategies for DNA Sequencing

Abstract

The BAC-end sequencing, or sequence-tagged-connector (STC), approach to genome sequencing involves sequencing the ends of BAC inserts to scatter sequence tags (STCs) randomly across the genome. Once any BAC or other large segment of DNA is sequenced to completion by conventional shotgun approaches, these STC tags can be used to identify a minimum tiling path of BAC clones overlapping the nucleation sequence for sequence extension. Here, we explore the properties of STC-sequencing strategies within a mathematical model of a random target with homologous repeats and imperfect sequencing technology to understand the consequences of varying various parameters on the incidence of problem clones and the cost of the sequencing project. Problem clones are defined as clones for which either (A) there is no identifiable overlapping STC to extend the sequence in a particular direction or (B) the identified STC with minimum overlap comes from a nonoverlapping clone, either owing to random false matches or repeat-family homology. Based on the minimum overlap, we estimate the number of clones to be entirely sequenced and, then, using cost estimates, identify the decision rule (the degree of sequence similarity required before a match is declared between an STC and a clone) to minimize overall sequencing cost. A method to optimize the overlap decision rule is highly desirable, because both the total cost and the number of problem clones are shown to be highly sensitive to this choice. For a target of 3 Gb containing ∼800 Mb of repeats with 85%–90% identity, we expect <10 problem clones with 15 times coverage by 150-kb clones. We derive the optimal redundancy and insert sizes of clone libraries for sequencing genomes of various sizes, from microbial to human. We estimate that establishing the resource of STCs as a means of identifying minimally overlapping clones represents only 1%–3% of the total cost of sequencing the human genome, and, up to a point of diminishing returns, a larger STC resource is associated with a smaller total sequencing cost.

Related Organizations
Keywords

Genome, Human, Humans, Sequence Analysis, DNA, Sequence Tagged Sites

  • BIP!
    Impact byBIP!
    selected citations
    These citations are derived from selected sources.
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    23
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Average
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Top 10%
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Top 10%
Powered by OpenAIRE graph
Found an issue? Give us feedback
selected citations
These citations are derived from selected sources.
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
23
Average
Top 10%
Top 10%
hybrid