
While Hadoop ecosystems become increasingly important for practitioners of large-scale data analysis, they also incur tremendous energy cost. This trend is driving up the need for designing energy-efficient Hadoop clusters in order to reduce the operational costs and the carbon emission associated with its energy consumption. However, despite extensive studies of the problem, existing approaches for energy efficiency have not fully considered the heterogeneity of both workload and machine hardware found in production environments. In this paper, we find that heterogeneity-oblivious task assignment approaches are detrimental to both performance and energy efficiency of Hadoop clusters. Our observation shows that even heterogeneity-aware techniques that aim to reduce the job completion time do not guarantee a reduction in energy consumption of heterogeneous machines. We propose a heterogeneity-aware task assignment approach, E-Ant, that aims to improve the overall energy consumption in a heterogeneous Hadoop cluster without sacrificing job performance. It adaptively schedules heterogeneous workloads on energy-efficient machines, without a priori knowledge of the workload properties. E-Ant employs an ant colony optimization approach that generates task assignment solutions based on the feedback of each task’s energy consumption reported by Hadoop TaskTrackers in an agile way. Furthermore, we integrate DVFS technique with E-Ant to further improve the energy efficiency of heterogeneous Hadoop clusters. It relies on a DVFS controller to dynamically scale the CPU frequency of each slave machine in response to time-varying resource demands. Experimental results on a heterogeneous cluster with varying hardware capabilities show that E-Ant with DVFS improves the overall energy savings for a synthetic workload from Microsoft by 23 and 17 percent compared to Fair Scheduler and Tarazu, respectively.
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 42 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Top 10% | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Top 10% | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Top 10% |
