descriptionPublicationkeyboard_double_arrow_right Article , Conference object 01 Dec 2016Publisher:IEEEJournal:2016 IEEE International Conference on Big Data (Big Data)

Authors: Saliya Ekanayake; Supun Kamburugamuve; Pulasthi Wickramasinghe; Fox, Geoffrey Charles;

doi: 10.1109/bigdata.2016.7840622 , 10.13140/rg.2.1.4560.1524

Java thread and process performance for parallel machine learning on multicore HPC clusters

- Summary
- Metrics

Abstract

The growing use of Big Data frameworks on large machines highlights the importance of performance issues and the value of High Performance Computing (HPC) technology. This paper looks carefully at three major frameworks Spark, Flink and Message Passing Interface (MPI) both in scaling across nodes and internally over the many cores inside modern nodes. We focus on the special challenges of the Java Virtual Machine (JVM) using an Intel Haswell HPC cluster with 24 cores per node. Two parallel machine learning algorithms, K-Means clustering and Multidimensional Scaling (MDS) are used in our performance studies. We identify three major issues — thread models, affinity patterns, and communication mechanisms — as factors affecting performance by large factors and show how to optimize them so that Java can match the performance of traditional HPC languages like C. Further we suggest approaches that preserve the user interface and elegant dataflow approach of Flink and Spark but modify the runtime so that these Big Data frameworks can achieve excellent performance and realize the goals of HPC-Big Data convergence.

Related Organizations

DePaul University
United States
Indiana University
United States

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	10
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Top 10%
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Top 10%

Found an issue? Give us feedback

Average

Top 10%

Fields of Science

engineering and technology

electrical engineering, electronic engineering, information engineering

Fields of Science

engineering and technology

electrical engineering, electronic engineering, information engineering

Upload OA version

Are you the author of this publication? Upload your Open Access version to Zenodo!

It’s fast and easy, just two clicks!

uploadUpload now