
In this work, we present our port of OpenFOAM to GPUs using the C++ standard parallel execution model (stdpar) introduced in ISO C++17. With a minimally intrusive strategy—primarily replacing serial loops with stdpar constructs—we offload the full computational workload of typical CFD simulations to multicore and manycore architectures. This approach is vendor‑agnostic, maintains a single unified codebase, and can be integrated easily into the main OpenFOAM release. We demonstrate performance using the icoFoam and simpleFoam solvers across four test cases: the 3D lid‑driven cavity, 3D conical diffuser, HPC motorbike, and drivAer automotive geometry. Experiments were conducted on a range of NVIDIA and AMD systems, including CPU‑only, hybrid CPU–GPU, and unified‑memory CPU–GPU configurations. Measured speedups relative to a fully populated 32‑core CPU socket range from 0.4× to 7.7×, depending on boundary‑condition complexity, turbulence modelling, and solver type. Details of the porting methodology and performance results are provided.
