descriptionPublicationkeyboard_double_arrow_right Article , Preprint , Other literature type 01 Jan 2015Embargo end date: 01 Jan 2014 English Publisher:Society for Industrial & Applied Mathematics (SIAM)Journal:SIAM Journal on Scientific Computing, volume 37, pages C439-C464 (issn: 1064-8275, eissn: 1095-7197,

Authors: Malas, Tareq Majed Yasin; Hager, G.; Ltaief, Hatem; Stengel, H.; Wellein, G.; Keyes, David E.;

doi: 10.1137/140991133 , 10.48550/arxiv.1410.3060

arXiv: http://arxiv.org/abs/1410.3060

handle: 10754/577336

Multicore-Optimized Wavefront Diamond Blocking for Optimizing Stencil Updates

- Summary
- Subjects
- Related research
  (1)
- Metrics

Abstract

The importance of stencil-based algorithms in computational science has focused attention on optimized parallel implementations for multilevel cache-based processors. Temporal blocking schemes leverage the large bandwidth and low latency of caches to accelerate stencil updates and approach theoretical peak performance. A key ingredient is the reduction of data traffic across slow data paths, especially the main memory interface. In this work we combine the ideas of multi-core wavefront temporal blocking and diamond tiling to arrive at stencil update schemes that show large reductions in memory pressure compared to existing approaches. The resulting schemes show performance advantages in bandwidth-starved situations, which are exacerbated by the high bytes per lattice update case of variable coefficients. Our thread groups concept provides a controllable trade-off between concurrency and memory usage, shifting the pressure between the memory interface and the CPU. We present performance results on a contemporary Intel processor.

Related Organizations

King Abdullah University of Science and Technology
Saudi Arabia
University of Erlangen-Nuremberg
Germany

Keywords

wavefront parallelization, FOS: Computer and information sciences, temporal blocking, Analysis of algorithms and problem complexity, multicore, stencil computations, Parallel numerical computation, diamond tiling, Distributed systems, Computer Science - Distributed, Parallel, and Cluster Computing, Distributed algorithms, Distributed, Parallel, and Cluster Computing (cs.DC), Parallel algorithms in computer science, energy-efficient algorithms, Performance evaluation, queueing, and scheduling in the context of computer systems

1 Research products, page 1 of 1

likwid software on Google Code
IsRelatedTo

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	62
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Top 10%
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Top 10%
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Top 10%

Found an issue? Give us feedback

Top 10%

Green

bronze

Fields of Science

engineering and technology

electrical engineering, electronic engineering, information engineering

Fields of Science

engineering and technology

electrical engineering, electronic engineering, information engineering

Multicore-Optimized Wavefront Diamond Blocking for Optimizing Stencil Updates

Multicore-Optimized Wavefront Diamond Blocking for Optimizing Stencil Updates

1 Research products, page 1 of 1

likwid software on Google Code