
This research proposed an automated configuration parameter classification model to arrange optimized Hive Query processing environment on the Apache Hadoop Distributed File System. In this model, the Analysis statistic command issued to measuring expected performance for the Hive tables on the Hadoop yarn platform with varying combinations of parameter configuration. The e-heuristic methodology is applied to effectively shrinking parameter search space during automated tuning process. We controlled the transition between evaluation spaces using one main parameter and one auxiliary parameter that are expected to reach the global optimum in each evaluation space. This model identifies the Hive parameters that access Hive table optimally and expects to improve query execution time by 15% against to the default Hive settings.
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
