Adv. Appl. Math. Mech., 12 (2020), pp. 1247-1260.
Published online: 2020-07
Cited by
- BibTex
- RIS
- TXT
This paper intends to explore the acceleration performances of parallel algorithms on CPU and GPU devices for two-dimensional Unified Gas-Kinetic Scheme (UGKS) in a comparative way. Such a comparison could provide guidance when the UGKS needs to be accelerated. To accomplish this goal, parallel algorithms on CPU and GPU devices are implemented and the speedup performances are investigated by a case of two-dimensional channel flow. Based on the multiscale feature of UGKS, a two-level fine-grain parallel strategy for both spatial and velocity spaces is adopted for GPU algorithm. The parallel CPU algorithm applies a two dimensional block layout that also parallelizes the spatial and velocity coordinates. A series of meshes with different sizes are tested to reveal the performance evolution of the two algorithms. Special attention is paid to cases where the discrete velocity space is large. The comparisons show that the proposed fine-grain GPU algorithm could take advantage of the feature of UGKS and provides significant speedups especially with the latest GPU version. On the other hand, parallel CPU strategy might provide more predictable and preferable performances when velocity space grid point number is large.
}, issn = {2075-1354}, doi = {https://doi.org/10.4208/aamm.OA-2019-0147}, url = {http://global-sci.org/intro/article_detail/aamm/17747.html} }This paper intends to explore the acceleration performances of parallel algorithms on CPU and GPU devices for two-dimensional Unified Gas-Kinetic Scheme (UGKS) in a comparative way. Such a comparison could provide guidance when the UGKS needs to be accelerated. To accomplish this goal, parallel algorithms on CPU and GPU devices are implemented and the speedup performances are investigated by a case of two-dimensional channel flow. Based on the multiscale feature of UGKS, a two-level fine-grain parallel strategy for both spatial and velocity spaces is adopted for GPU algorithm. The parallel CPU algorithm applies a two dimensional block layout that also parallelizes the spatial and velocity coordinates. A series of meshes with different sizes are tested to reveal the performance evolution of the two algorithms. Special attention is paid to cases where the discrete velocity space is large. The comparisons show that the proposed fine-grain GPU algorithm could take advantage of the feature of UGKS and provides significant speedups especially with the latest GPU version. On the other hand, parallel CPU strategy might provide more predictable and preferable performances when velocity space grid point number is large.