Downloads provided by UsageCounts
Big data frameworks, such as Spark and Giraph, suffer from high memory pressure because they allocate massive volumes of long-lived objects on the managed heap. Thus, frameworks temporarily move long-lived objects outside the managed heap (off-heap) on a fast storage device. Unfortunately, this practice results in: (1) high serialization/deserialization (S/D) cost, and (2) high garbage collection (GC) cost when many off-heap objects are moved back to the managed heap for processing. In this paper, we propose HugeHeap, which extends the managed runtime (JVM) to use a second, high-capacity heap over a fast storage device that coexists with the regular heap. HugeHeap provides direct access to objects on the second heap (no S/D). It also reduces GC cost by fencing the garbage collector from scanning the second heap. HugeHeap leverages frameworks’ property of choosing specific objects for off-heap placement and offers frameworks a hint-based interface for moving such objects to the second heap. We implement HugeHeap in OpenJDK and evaluate it with 15 widely used applications in two real-world big data frameworks, Spark and Giraph. Our evaluation shows that HugeHeap improves performance up to 83% compared to native Spark and Giraph, and it also consumes up to 87% less DRAM capacity. Finally, it outperforms Panthera, a garbage collector specialized for hybrid memories, by up to 69%.
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
| views | 26 | |
| downloads | 6 |

Views provided by UsageCounts
Downloads provided by UsageCounts