
This dataset contains 1,077 FASTA files and CSV files. Each FASTA file includes 25-character long sequences similar to each other. We have a CSV file for each tool (i.e., minimap2 and BLEND) and configuration (i.e., different number of neighbors in BLEND). CSV files include the non-identical k-mer pairs (16-mers) that generate the same hash value (i.e., collisions). These k-mers are extracted from sequences that are similar to each other. In each line, we show the hash value of the k-mers, the actual sequene pairs that the k-mers are extracted from, k-mer pairs that generate the same hash value, and the edit distance between these k-mers.
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
