K-mer collision statistics (BLEND: A Fast, Memory-Efficient, and Accurate Mechanism to Find Fuzzy Seed Matches in Genome Analysis)

Name: K-mer collision statistics (BLEND: A Fast, Memory-Efficient, and Accurate Mechanism to Find Fuzzy Seed Matches in Genome Analysis)
Creator: Firtina, Can

Firtina, Can

Found an issue? Give us feedback

ZENODOarrow_drop_down

ZENODO

Dataset . 2022

License: CC BY

Data sources: Datacite

ZENODO

Dataset . 2022

License: CC BY

Data sources: Datacite

ZENODO

Dataset . 2022

License: CC BY

Data sources: ZENODO

K-mer collision statistics (BLEND: A Fast, Memory-Efficient, and Accurate Mechanism to Find Fuzzy Seed Matches in Genome Analysis)

Research datakeyboard_double_arrow_right Dataset 14 Nov 2022Publisher:Zenodo

Authors: Firtina, Can;

doi: 10.5281/zenodo.7319785 , 10.5281/zenodo.7319786

K-mer collision statistics (BLEND: A Fast, Memory-Efficient, and Accurate Mechanism to Find Fuzzy Seed Matches in Genome Analysis)

- Summary
- Metrics

Abstract

This dataset contains 1,077 FASTA files and CSV files. Each FASTA file includes 25-character long sequences similar to each other. We have a CSV file for each tool (i.e., minimap2 and BLEND) and configuration (i.e., different number of neighbors in BLEND). CSV files include the non-identical k-mer pairs (16-mers) that generate the same hash value (i.e., collisions). These k-mers are extracted from sequences that are similar to each other. In each line, we show the hash value of the k-mers, the actual sequene pairs that the k-mers are extracted from, k-mer pairs that generate the same hash value, and the edit distance between these k-mers.

Related Organizations

ETH Zurich
Switzerland

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

0

Average