A technique for overlapping computation and communication for block recursive algorithms

Name: A technique for overlapping computation and communication for block recursive algorithms
Keywords: distributed-memory parallel programs, Theory of software, Theory of programming languages, Parallel algorithms in computer science, performance

Sandeep K. S. Gupta; Chua-Huang Huang; P. Sadayappan; Rodney W. Johnson

Found an issue? Give us feedback

Concurrency Practice...arrow_drop_down

Concurrency Practice and Experience

Article . 1998 . Peer-reviewed

License: Wiley TDM

Data sources: Crossref

zbMATH Open

Article . 1998

Data sources: zbMATH Open

DBLP

Article

Data sources: DBLP

https://dx.doi.org/10.1002/(si...

Article

Data sources: Microsoft Academic Graph

A technique for overlapping computation and communication for block recursive algorithms

descriptionPublicationkeyboard_double_arrow_right Article 01 Feb 1998 English Publisher:WileyJournal:Concurrency: Practice and Experience, volume 10, pages 73-90 (issn: 1040-3108, eissn: 1096-9128,

Copyright policy )

Authors: Sandeep K. S. Gupta; Chua-Huang Huang; P. Sadayappan; Rodney W. Johnson;

doi: 10.1002/(sici)1096-9128(199802)10:2<73::aid-cpe289>3.0.co;2-n

A technique for overlapping computation and communication for block recursive algorithms

- Summary
- Subjects
- Metrics

Abstract

Summary: This paper presents a design methodology for developing efficient distributed-memory parallel programs for block recursive algorithms such as the fast Fourier transform (FFT) and bitonic sort. This design methodology is specifically suited for most modern supercomputers having a distributed memory architecture with a circuit-switched or wormhole routed mesh or a hypercube interconnection network. A mathematical framework based on the tensor product and other matrix operations is used for representing algorithms. Communication-efficient implementations with effectively overlapped computation and communication are achieved by manipulating the mathematical representation using the tensor product algebra. Performance results for FFT programs on the Intel Paragon are presented.

Related Organizations

The Ohio State University
United States
Colorado State University
United States
St. Cloud State University
United States

Keywords

distributed-memory parallel programs, Theory of software, Theory of programming languages, Parallel algorithms in computer science, performance

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	4
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

4

Average

Upload OA version

Are you the author of this publication? Upload your Open Access version to Zenodo!

It’s fast and easy, just two clicks!

uploadUpload now