
doi: 10.1109/dcc.2016.73
A classical compression method for trees is to exploit subtree repeats in the structure byrepresenting them by directed acyclic graphs. We propose a lossy compression method thatconsists in computing a structure with high redundancy that approximates the initial data.Trees are commonly used to represent hierarchical data appearing in computerscience or in biology. Compression methods often take advantage of repeated substructuresappearing in the tree (see the survey [1]). Directed Acyclic Graph (DAG)compression is a classical approach that exploits subtree repeats in the structure.However, it should be noted that trees without a high level of redundancy are ofteninsufficiently compressed by this procedure. Self-nested trees are such that all theircomplete subtrees of a given height are isomorphic. The systematic repetition ofsubtrees gives them remarkable compression properties by this approach.We address lossy compression for unordered trees. Loss can be acceptable forvisual representation of scenes composed of plants, for example. Our method consistsin computing the DAG version of a self-nested tree that closely approximates the treeto compress. A first approximation has been proposed in [2] in which the authorscompute in polynomial time the Nearest Embedding Self-nested Tree (NEST) of theinitial structure, namely the self-nested tree that minimizes the edit distance to theinitial tree and that embeds it. We focus on the presentation of two new algorithms tofind a self-nested structure that approximates the initial tree better than the NEST.These solutions rely on a technique to find the centroid of a forest of small heightand may be computed in polynomial time for trees with bounded degree. We proveon a simulated dataset that the error rates of these lossy compression methods arealways better than the loss involved in the previous algorithm (on average, we observea substantial gain of around 20%), while the compression rates are equivalent.
[MATH.MATH-CO] Mathematics [math]/Combinatorics [math.CO], [INFO.INFO-IT] Computer Science [cs]/Information Theory [cs.IT]
[MATH.MATH-CO] Mathematics [math]/Combinatorics [math.CO], [INFO.INFO-IT] Computer Science [cs]/Information Theory [cs.IT]
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
