Powered by OpenAIRE graph
Found an issue? Give us feedback
addClaim

Searching Genomic Databases using the Prime Factor Filter

Authors: Russel Pears; Jimmy Ee;

Searching Genomic Databases using the Prime Factor Filter

Abstract

The major bottleneck in searching genomic databases is the sheer size of the databases involved. A number of different solutions to the problem of aligning query sequences to genomic databases have been proposed, including the widely used BLAST and FASTA systems. While such systems are effective against traditional applications such as query alignment, they do not scale well for applications such as whole genome shotgun sequencing and all versus all comparisons of one organism against another. The latter application has quadratic time complexity in the size of the databases involved and requires a different approach to BLAST type search engines that rely on a linear scan of the database. Our approach relies on a two-stage filter to prune a significant fraction of the database prior to alignment. The filter uses the MRS index[8] as the first stage followed by a novel indexing scheme that we propose in this paper. The MRS index screens sequences that map to the same frequency vector and has been shown to produce speedups of up to 12 over systems that do not employ such an index. However, the MRS index is inadequate against sequences that are inherently different while still mapping to the same frequency vector. Our filter, based on the prime factor Indexing scheme is successful in eliminating a large fraction of such false positives that survive the MRS index. Our experiments show that at least 75% of the false positives is eliminated, resulting in speedups of up to 5 times over the MRS indexing scheme.

Related Organizations
  • BIP!
    Impact byBIP!
    selected citations
    These citations are derived from selected sources.
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    0
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Average
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Average
Powered by OpenAIRE graph
Found an issue? Give us feedback
selected citations
These citations are derived from selected sources.
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
0
Average
Average
Average
Upload OA version
Are you the author of this publication? Upload your Open Access version to Zenodo!
It’s fast and easy, just two clicks!