Powered by OpenAIRE graph
Found an issue? Give us feedback
image/svg+xml Jakob Voss, based on art designer at PLoS, modified by Wikipedia users Nina and Beao Closed Access logo, derived from PLoS Open Access logo. This version with transparent background. http://commons.wikimedia.org/wiki/File:Closed_Access_logo_transparent.svg Jakob Voss, based on art designer at PLoS, modified by Wikipedia users Nina and Beao IEEE Transactions on...arrow_drop_down
image/svg+xml Jakob Voss, based on art designer at PLoS, modified by Wikipedia users Nina and Beao Closed Access logo, derived from PLoS Open Access logo. This version with transparent background. http://commons.wikimedia.org/wiki/File:Closed_Access_logo_transparent.svg Jakob Voss, based on art designer at PLoS, modified by Wikipedia users Nina and Beao
IEEE Transactions on Information Theory
Article . 2020 . Peer-reviewed
License: IEEE Copyright
Data sources: Crossref
DBLP
Article
Data sources: DBLP
versions View all 2 versions
addClaim

Efficient Encoding/Decoding of GC-Balanced Codes Correcting Tandem Duplications

Authors: Yeow Meng Chee; Johan Chrisnata; Han Mao Kiah; Tuan Thanh Nguyen;

Efficient Encoding/Decoding of GC-Balanced Codes Correcting Tandem Duplications

Abstract

Tandem duplication is the process of inserting a copy of a segment of DNA adjacent to the original position. Motivated by applications that store data in living organisms, Jain et al. (2017) proposed the study of codes that correct tandem duplications. All code constructions are based on irreducible words . Such code constructions are almost optimal to combat tandem duplications of length at most $k$ where $k\leq 3$ . However, the problem of designing efficient encoder/decoder for such codes has not been investigated. In addition, the method cannot be extended to deal with the case of arbitrary $k$ , where $k\geq 4$ . In this work, we study efficient encoding/decoding methods for irreducible words over general $q$ -ary alphabet. Our methods provide the first known efficient encoder/decoder for $q$ -ary codes correcting tandem duplications of length at most $k$ , where $k\leq 3$ . In particular, we describe an $(\ell,m)$ -finite state encoder and show that when $m=\Theta (1/\epsilon)$ and $\ell =\Theta (1/\epsilon)$ , the encoder achieves rate that is $\epsilon $ away from the optimal rate. We also provide ranking/unranking algorithms for irreducible words and modify the algorithms to reduce the space requirements for the finite state encoder. Over the DNA alphabet (or quaternary alphabet), we also impose weight constraint on the codewords. In particular, a quaternary word is ${\tt GC}$ -balanced if exactly half of the symbols of are either ${\tt C}$ or ${\tt G}$ . Via a modification of Knuth’s balancing technique, we provide an efficient method that translates quaternary messages into ${\tt GC}$ -balanced codewords and the resulting codebook is able to correct tandem duplications of length at most $k$ , where $k\leq 3$ . In addition, we provide the first known construction of codes to combat tandem duplications of length at most $k$ , where $k \geq 4$ . Such codes can correct duplication errors in linear-time and they are almost optimal in terms of rate.

  • BIP!
    Impact byBIP!
    selected citations
    These citations are derived from selected sources.
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    14
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Top 10%
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Top 10%
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Top 10%
Powered by OpenAIRE graph
Found an issue? Give us feedback
selected citations
These citations are derived from selected sources.
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
14
Top 10%
Top 10%
Top 10%
Upload OA version
Are you the author of this publication? Upload your Open Access version to Zenodo!
It’s fast and easy, just two clicks!