Powered by OpenAIRE graph
Found an issue? Give us feedback
addClaim

This Research product is the result of merged Research products in OpenAIRE.

You have already added 0 works in your ORCID record related to the merged Research product.

A Massively Parallel Semicoarsening Multigrid Linear Solver on Multi-Core and Multi-GPU Architectures

Authors: A. M. Manea; H. A. Tchelepi;

A Massively Parallel Semicoarsening Multigrid Linear Solver on Multi-Core and Multi-GPU Architectures

Abstract

Abstract In this work, we have designed and implemented a massively parallel version of the Semicoarsening Multigrid Solver (Schaffer 1998), which is capable of handling highly heterogeneous and anisotropic 3D reservoirs, on a parallel architecture with multiple GPU's. For comparison purposes, the same algorithm was also implemented on a shared-memory multi-core architecture. The implementation exploits the parallelism in every module of the original Multigrid algorithm, including both the setup stage and the solution stage, without modifying the original algorithm basic steps. The benefits of this approach are twofold: maintaining the inherent strong linear convergence of the serial Multigrid algorithm, and making advantage of the shared-memory architecture to minimize the need for communication. The design of the algorithm uses a combination of plane relaxation and semicoarsening to efficiently handle anisotropies in 3D, (Dendy et al. 1989). Since the z-direction in most reservoir models is a direction of strong-coupling compared to the x- and y- directions, semicoarsening is employed in the z-direction, and plane relaxation is used for relaxation on x-y planes. Besides the need to solve 2D-systems for plane-relaxation, a set of 2D systems must be also solved on each multigrid level during the setup stage to get an approximate representation of the exact prolongation operator described in Schaffer (1998). For handling both types of 2D systems, a massively parallel version of the 2D Black Box Multigrid (Alcouffe et al. 1981) was designed to handle those 2D solves. To be able to handle problems involving high anisotropies in the x- and y- directions, the 2D Black-Box Multigrid uses alternating line-relaxation with zebra ordering to parallelize across multiple line solves. Due to the inherent granularity difference between the GPU threads and the multi-core threads, line-relaxation was designed to use Thomas Algorithm on the multi-core architecture and Parallel Cyclic Reduction (NVIDIA Corporation 2014b) on the GPU architecture. In both the 3D Semicoarsening Multigrid and the 2D Black-Box Multigrid, V-cycling was used to avoid spending more time at coarser levels and thus affecting the parallel efficiency. To minimize the expensive communication between the host and the GPU (and amongst GPU's), every 2D-solve is explicitly handled by a single GPU. The two versions of the solver were tested using various highly heterogeneous multi-million-cell problems derived from SPE10 Second Dataset Benchmark. For problems with sizes large enough, the GPU implementation, running on KEPLER-Based K40c cards, is found to be always faster than the multi-core implementation running on 12 Intel® Xeon® E5-2620 v2 2.10 GHz cores. In addition, the inherent serial nature of multiplicative multigrid, along with the approach taken to minimize the communication through PCI-e, were found to limit the scalability beyond 3-4 cores/GPU's.

Related Organizations
  • BIP!
    Impact byBIP!
    selected citations
    These citations are derived from selected sources.
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    2
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Average
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Average
Powered by OpenAIRE graph
Found an issue? Give us feedback
selected citations
These citations are derived from selected sources.
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
2
Average
Average
Average
Upload OA version
Are you the author of this publication? Upload your Open Access version to Zenodo!
It’s fast and easy, just two clicks!