arrow
Volume 33, Issue 1
Heterogeneous LBM Simulation Code with LRnLA Algorithms

Vadim Levchenko & Anastasia Perepelkina

Commun. Comput. Phys., 33 (2023), pp. 214-244.

Published online: 2023-02

Export citation
  • Abstract

A design of a new heterogeneous code for LBM simulations is proposed. By heterogeneous computing we mean a collaborative computation on CPU and GPU, which is characterized by the following features: the data is distributed between CPU and GPU memory spaces taking advantage of both parallel hierarchies; the capabilities of both SIMT GPU and SIMD GPU parallelization are used for calculations; the algorithms in use efficiently conceal the CPU-GPU data exchange; the subdivision of the computing task is performed with an account for the strong points of both processing units: high performance of GPU, low latency, and advanced memory hierarchy of CPU. This code is a continuation of our work in the development of LRnLA codes for LBM. Previous LRnLA codes had good efficiency both for CPU and GPU computing, and allowed GPU simulation performed on data stored in CPU RAM without performance loss on CPU-GPU data transfer. In the new code, we use methods and instruments that can be flexibly adapted to GPU and CPU instruction sets. We present the theoretical study of the performance of the proposed code and suggest implementation techniques. The bottlenecks are identified. As a result, we conclude that larger problems can be simulated with higher efficiency in the heterogeneous system.

  • AMS Subject Headings

65Y05, 65Y20, 65-04, 76-10

  • Copyright

COPYRIGHT: © Global Science Press

  • Email address
  • BibTex
  • RIS
  • TXT
@Article{CiCP-33-214, author = {Levchenko , Vadim and Perepelkina , Anastasia}, title = {Heterogeneous LBM Simulation Code with LRnLA Algorithms}, journal = {Communications in Computational Physics}, year = {2023}, volume = {33}, number = {1}, pages = {214--244}, abstract = {

A design of a new heterogeneous code for LBM simulations is proposed. By heterogeneous computing we mean a collaborative computation on CPU and GPU, which is characterized by the following features: the data is distributed between CPU and GPU memory spaces taking advantage of both parallel hierarchies; the capabilities of both SIMT GPU and SIMD GPU parallelization are used for calculations; the algorithms in use efficiently conceal the CPU-GPU data exchange; the subdivision of the computing task is performed with an account for the strong points of both processing units: high performance of GPU, low latency, and advanced memory hierarchy of CPU. This code is a continuation of our work in the development of LRnLA codes for LBM. Previous LRnLA codes had good efficiency both for CPU and GPU computing, and allowed GPU simulation performed on data stored in CPU RAM without performance loss on CPU-GPU data transfer. In the new code, we use methods and instruments that can be flexibly adapted to GPU and CPU instruction sets. We present the theoretical study of the performance of the proposed code and suggest implementation techniques. The bottlenecks are identified. As a result, we conclude that larger problems can be simulated with higher efficiency in the heterogeneous system.

}, issn = {1991-7120}, doi = {https://doi.org/10.4208/cicp.OA-2022-0055}, url = {http://global-sci.org/intro/article_detail/cicp/21432.html} }
TY - JOUR T1 - Heterogeneous LBM Simulation Code with LRnLA Algorithms AU - Levchenko , Vadim AU - Perepelkina , Anastasia JO - Communications in Computational Physics VL - 1 SP - 214 EP - 244 PY - 2023 DA - 2023/02 SN - 33 DO - http://doi.org/10.4208/cicp.OA-2022-0055 UR - https://global-sci.org/intro/article_detail/cicp/21432.html KW - LBM, Roofline, memory-bound, GPU, LRnLA. AB -

A design of a new heterogeneous code for LBM simulations is proposed. By heterogeneous computing we mean a collaborative computation on CPU and GPU, which is characterized by the following features: the data is distributed between CPU and GPU memory spaces taking advantage of both parallel hierarchies; the capabilities of both SIMT GPU and SIMD GPU parallelization are used for calculations; the algorithms in use efficiently conceal the CPU-GPU data exchange; the subdivision of the computing task is performed with an account for the strong points of both processing units: high performance of GPU, low latency, and advanced memory hierarchy of CPU. This code is a continuation of our work in the development of LRnLA codes for LBM. Previous LRnLA codes had good efficiency both for CPU and GPU computing, and allowed GPU simulation performed on data stored in CPU RAM without performance loss on CPU-GPU data transfer. In the new code, we use methods and instruments that can be flexibly adapted to GPU and CPU instruction sets. We present the theoretical study of the performance of the proposed code and suggest implementation techniques. The bottlenecks are identified. As a result, we conclude that larger problems can be simulated with higher efficiency in the heterogeneous system.

Levchenko , Vadim and Perepelkina , Anastasia. (2023). Heterogeneous LBM Simulation Code with LRnLA Algorithms. Communications in Computational Physics. 33 (1). 214-244. doi:10.4208/cicp.OA-2022-0055
Copy to clipboard
The citation has been copied to your clipboard