
doi: 10.2144/98246bc04
pmid: 9631194
To our knowledge, the most widely used automated DNA sequencers are manufactured by PE Applied Biosystems (Foster City, CA, USA). Our laboratory uses Model 377 Sequencers equipped with the XL upgrade (i.e., collection software Version 2.0 and analysis software Version 3.0). Power transients, network noise, an excessively fragmented hard disk or other such incidents can cause a sequencer to prematurely stop data collection. If such an incident occurs, the most common practice used to salvage as much of the data as possible is simply to restart the data collection software. As a result of this maneuver, two (or more) gel files are created, each containing fragments of the data from the same DNA samples. In our laboratory, this situation occurred in an experiment in which a sample of pGEM labeled with Energy Transfer Dye Primers (Amersham Pharmacia Biotech, Piscataway, NY, USA) was run. The first of the gel file fragments created was 1624-scans-long, while the second was 6792-scans-long. Figure 1 shows the two electropherograms produced. Note that the first trace fragment (Figure 1A) is completely unusable, while the second (Figure 1B) has five base calls that deviate from the known sequence of pGEM (3). The errors are underlined in the figure. Note that PE Applied Biosystems sequencing analysis software calculates the base-spacing by averaging the peak-to-peak distance in the raw electropherogram between scans 1000 and 2000 relative to the location of the primer peak (2). Thus base-calling from fragmented gel files can be problematic if the analysis software attempts to use this criteria to evaluate base-spacing when the primer peak is not where the analysis software assumes it is. After application of a short C program, called GelWeld, one large gel file was reconstructed from the two gel-file fragments. Figure 2 shows the electropherogram extracted from the reconstructed gel file. Note that the three base-calling errors present in Figure 1B have been corrected. GelWeld works by manipulating the data storage architecture of the Macintosh computer (Apple Computer, Cupertino, CA, USA) in general and of PE Applied Biosystems gel files in particular, both of which have already been discussed elsewhere (1). Specifically, it copies the information resident in the data forks of both the fragment files over the data fork of a third “template” file. The original contents of the template file’s data fork are deleted in the process, so it is important to keep a backup copy of the template file before running GelWeld. GelWeld is available for download at: http://gestec. swmed.edu/gestec2.htm along with a complete set of instructions. A copy of the source code is also provided should
Electronic Data Processing, QH301-705.5, Image Processing, Computer-Assisted, Computational Biology, Database Management Systems, Sequence Analysis, DNA, Biology (General), Software
Electronic Data Processing, QH301-705.5, Image Processing, Computer-Assisted, Computational Biology, Database Management Systems, Sequence Analysis, DNA, Biology (General), Software
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
