
Nanopore sequencing is one of the state-of-the-art sequencing technologies. It passes a DNA sample through a pore which changes the ionic current in the pore. Due to the size of the pore, there are usually five nucleotides (5-mer) present in the pore influencing the measured signal. Each of the 1024 possible 5-mers produces a different signal, and this information is used for basecalling (converting the raw signal to a sequence of nucleotides). The signal is approximately rectangular because the 5-mer changes one nucleotide at a time, but there is a lot of noise present. The goal of this thesis was to develop a DNA nanopore sequencing basecaller using modern deep learning architectures with self-supervised learning in mind. The architecture is mainly based on transformers. The basecaller was evaluated on publicly available datasets. The solution called AttentionCall was implemented in Python and the PyTorch library. The source code is available on GitHub at github.com/StanislavPavlic/attentioncall.
bioinformatics ; basecalling ; nanopore sequencing ; deep learning ; transformers ; CTC
bioinformatics ; basecalling ; nanopore sequencing ; deep learning ; transformers ; CTC
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
