
doi: 10.1093/bib/bby072
pmid: 30137230
AbstractMotivation: With the recent advances in DNA sequencing technologies, the study of the genetic composition of living organisms has become more accessible for researchers. Several advances have been achieved because of it, especially in the health sciences. However, many challenges which emerge from the complexity of sequencing projects remain unsolved. Among them is the task of assembling DNA fragments from previously unsequenced organisms, which is classified as an NP-hard (nondeterministic polynomial time hard) problem, for which no efficient computational solution with reasonable execution time exists. However, several tools that produce approximate solutions have been used with results that have facilitated scientific discoveries, although there is ample room for improvement. As with other NP-hard problems, machine learning algorithms have been one of the approaches used in recent years in an attempt to find better solutions to the DNA fragment assembly problem, although still at a low scale.Results: This paper presents a broad review of pioneering literature comprising artificial intelligence-based DNA assemblers—particularly the ones that use machine learning—to provide an overview of state-of-the-art approaches and to serve as a starting point for further study in this field.
Artificial intelligence, Genome assembly, Genome, High-Throughput Nucleotide Sequencing, Sequence Analysis, DNA, [INFO] Computer Science [cs], 004, Machine Learning, Machine learning, De novo assembly, [INFO]Computer Science [cs], Metagenomics, [INFO.INFO-BI]Computer Science [cs]/Bioinformatics [q-bio.QM], Algorithms, [INFO.INFO-BI] Computer Science [cs]/Bioinformatics [q-bio.QM]
Artificial intelligence, Genome assembly, Genome, High-Throughput Nucleotide Sequencing, Sequence Analysis, DNA, [INFO] Computer Science [cs], 004, Machine Learning, Machine learning, De novo assembly, [INFO]Computer Science [cs], Metagenomics, [INFO.INFO-BI]Computer Science [cs]/Bioinformatics [q-bio.QM], Algorithms, [INFO.INFO-BI] Computer Science [cs]/Bioinformatics [q-bio.QM]
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 21 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Top 10% | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Top 10% |
