Efficient and low-complexity variable-to-variable length coding for DNA storage

Name: Efficient and low-complexity variable-to-variable length coding for DNA storage
Keywords: Base Composition, QH301-705.5, DNA storage, Research, Computer applications to medicine. Medical informatics, R858-859.7, Information Storage and Retrieval, DNA, Sequence Analysis, DNA, Variable-to-variable length code

Yunfei Gao; Albert No

Found an issue? Give us feedback

BMC Bioinformaticsarrow_drop_down

BMC Bioinformatics

Article . 2024 . Peer-reviewed

License: CC BY NC ND

Data sources: Crossref

BMC Bioinformatics

Article . 2024

Data sources: Europe PubMed Central

PubMed Central

Other literature type . 2024

License: CC BY NC ND

Data sources: PubMed Central

BMC Bioinformatics

Article . 2024

Data sources: DOAJ

DBLP

Article

Data sources: DBLP

Efficient and low-complexity variable-to-variable length coding for DNA storage

descriptionPublicationkeyboard_double_arrow_right Article , Other literature type 01 Oct 2024 English Publisher:Springer Science and Business Media LLCJournal:BMC Bioinformatics, volume 25 (eissn: 1471-2105,

Copyright policy )

Authors: Yunfei Gao; Albert No;

doi: 10.1186/s12859-024-05943-y

pmid: 39354338

pmc: PMC11446080

Efficient and low-complexity variable-to-variable length coding for DNA storage

- Summary
- Subjects
- Related research
  (1)
- Metrics

Abstract

Efficient DNA-based storage systems offer substantial capacity and longevity at reduced costs, addressing anticipated data growth. However, encoding data into DNA sequences is limited by two key constraints: 1) a maximum of h consecutive identical bases (homopolymer constraint h), and 2) a GC ratio between [ 0.5 - c GC , 0.5 + c GC ] (GC content constraint c GC ). Sequencing or synthesis errors tend to increase when these constraints are violated.In this research, we address a pure source coding problem in the context of DNA storage, considering both homopolymer and GC content constraints. We introduce a novel coding technique that adheres to these constraints while maintaining linear complexity for increased block lengths and achieving near-optimal rates. We demonstrate the effectiveness of the proposed method through experiments on both randomly generated data and existing files. For example, when h = 4 and c GC = 0.05 , the rate reached 1.988, close to the theoretical limit of 1.990. The associated code can be accessed at GitHub.We propose a variable-to-variable-length encoding method that does not rely on concatenating short predefined sequences, which achieves near-optimal rates.

Related Organizations

Yonsei University
Korea (Republic of)
Shanghai Jiao Tong University
China (People's Republic of)
Ruijin Hospital
China (People's Republic of)
Ruijin Hospital, Shanghai Jiaotong University School of Medicine
China (People's Republic of)

Keywords

Base Composition, QH301-705.5, DNA storage, Research, Computer applications to medicine. Medical informatics, R858-859.7, Information Storage and Retrieval, DNA, Sequence Analysis, DNA, Variable-to-variable length code, GC content constraint, Biology (General), Homopolymer constraint, Algorithms

1 Research products, page 1 of 1

DNA_storage_channel_codec software on GitHub
IsRelatedTo

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

0

Average

Green

gold

Efficient and low-complexity variable-to-variable length coding for DNA storage

Efficient and low-complexity variable-to-variable length coding for DNA storage

1 Research products, page 1 of 1

DNA_storage_channel_codec software on GitHub