
AbstractA wordxthat is absent from a wordyis calledminimalif all its proper factors occur iny. Given a collection ofkwordsy1, … ,ykover an alphabetΣ, we are asked to compute the set$\mathrm {M}^{\ell }_{\{y_1,\ldots ,y_k\}}$M{y1,…,yk}ℓof minimal absent words of length at mostℓof the collection {y1, … ,yk}. The set$\mathrm {M}^{\ell }_{\{y_1,\ldots ,y_k\}}$M{y1,…,yk}ℓcontains all the wordsxsuch thatxis absent from all the words of the collection while there existi,j, such that the maximal proper suffix ofxis a factor ofyiand the maximal proper prefix ofxis a factor ofyj. In data compression, this corresponds to computing the antidictionary ofkdocuments. In bioinformatics, it corresponds to computing words that are absent from a genome ofkchromosomes. Indeed, the set$\mathrm {M}^{\ell }_{y}$Myℓof minimal absent words of a wordyis equal to$\mathrm {M}^{\ell }_{\{y_1,\ldots ,y_k\}}$M{y1,…,yk}ℓfor any decomposition ofyinto a collection of wordsy1, … ,yksuch that there is an overlap of length at leastℓ− 1 between any two consecutive words in the collection. This computation generally requiresΩ(n) space forn= |y| using any of the plenty available$\mathcal {O}(n)$O(n)-time algorithms. This is because anΩ(n)-sized text index is constructed overywhich can be impractical for largen. We do the identical computation incrementally using output-sensitive space. This goal is reasonable when$\| \mathrm {M}^{\ell }_{\{y_1,\ldots ,y_N\}}\| =o(n)$∥M{y1,…,yN}ℓ∥=o(n), for allN∈ [1,k], where ∥S∥ denotes the sum of the lengths of words in setS. For instance, in the human genome,n≈ 3 × 109but$\| \mathrm {M}^{12}_{\{y_1,\ldots ,y_k\}}\| \approx 10^{6}$∥M{y1,…,yk}12∥≈106. We consider a constant-sized alphabet for stating our results. We show thatall$\mathrm {M}^{\ell }_{y_{1}},\ldots ,\mathrm {M}^{\ell }_{\{y_1,\ldots ,y_k\}}$My1ℓ,…,M{y1,…,yk}ℓcan be computed in$\mathcal {O}(kn+{\sum }^{k}_{N=1}\| \mathrm {M}^{\ell }_{\{y_1,\ldots ,y_N\}}\| )$O(kn+∑N=1k∥M{y1,…,yN}ℓ∥)total time using$\mathcal {O}(\textsc {MaxIn}+\textsc {MaxOut})$O(MaxIn+MaxOut)space, where MaxIn is the length of the longest word in {y1, … ,yk} and$\textsc {MaxOut}=\max \limits \{\| \mathrm {M}^{\ell }_{\{y_1,\ldots ,y_N\}}\| :N\in [1,k]\}$MaxOut=max{∥M{y1,…,yN}ℓ∥:N∈[1,k]}. Proof-of-concept experimental results are also provided confirming our theoretical findings and justifying our contribution.
Antidictionary, absent word, 000, Output sensitive algorithm, Analysis of algorithms and problem complexity, string algorithm, Algorithms on strings, output-sensitive algorithm, 004, String algorithm, Data compression, antidictionary, [INFO]Computer Science [cs], Absent word, Genetics and epigenetics, Coding and information theory (compaction, compression, models of communication, encoding schemes, etc.) (aspects in computer science), data compression
Antidictionary, absent word, 000, Output sensitive algorithm, Analysis of algorithms and problem complexity, string algorithm, Algorithms on strings, output-sensitive algorithm, 004, String algorithm, Data compression, antidictionary, [INFO]Computer Science [cs], Absent word, Genetics and epigenetics, Coding and information theory (compaction, compression, models of communication, encoding schemes, etc.) (aspects in computer science), data compression
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 3 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
