publication . Preprint . 2015

Identifying Dwarfs Workloads in Big Data Analytics

Gao, Wanling; Luo, Chunjie; Zhan, Jianfeng; Ye, Hainan; He, Xiwen; Wang, Lei; Zhu, Yuqing; Tian, Xinhui;
Open Access English
  • Published: 26 May 2015
Big data benchmarking is particularly important and provides applicable yardsticks for evaluating booming big data systems. However, wide coverage and great complexity of big data computing impose big challenges on big data benchmarking. How can we construct a benchmark suite using a minimum set of units of computation to represent diversity of big data analytics workloads? Big data dwarfs are abstractions of extracting frequently appearing operations in big data computing. One dwarf represents one unit of computation, and big data workloads are decomposed into one or more dwarfs. Furthermore, dwarfs workloads rather than vast real workloads are more cost-effici...
free text keywords: Computer Science - Databases
Related Organizations
Download from
34 references, page 1 of 3





[5] and big data 1 0.pdf.

[6] downloads/DMbookTOC1.pdf.


[8] Alexa topsites.;0.

[9] Hadoop.

[10] Mllib.

[11] Spark.

[12] Tpc-c benchmark.

[13] Tpc-ds benchmark.

[14] T. G. Armstrong, V. Ponnekanti, D. Borthakur, and M. Callaghan. Linkbench: a database benchmark based on the facebook social graph. In Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, pages 1185-1196. ACM, 2013. [OpenAIRE]

[15] K. Asanovic, R. Bodik, B. C. Catanzaro, J. J. Gebis, P. Husbands, K. Keutzer, D. A. Patterson, W. L. Plishker, J. Shalf, S. W. Williams, and Y. Katherine. The landscape of parallel computing research: A view from berkeley. Technical report, Technical Report UCB/EECS-2006-183, EECS Department, University of California, Berkeley, 2006. [OpenAIRE]

34 references, page 1 of 3
Powered by OpenAIRE Open Research Graph
Any information missing or wrong?Report an Issue