[Openmp-commits] [PATCH] D119313: [OpenMP][CUDA] Set the hard team limit to 2^31-1

Shilei Tian via Phabricator via Openmp-commits openmp-commits at lists.llvm.org
Thu Feb 10 10:55:41 PST 2022

tianshilei1992 added a comment.

Actually, the hard limit of 65536 can help with performance in some cases. For example, for BabelStream benchmark, if we don't cap the team number, it could have 262144 blocks. After capping to 65536, the performance improved a lot.

  Capping to 65536:
              Type  Time(%)      Time     Calls       Avg       Min       Max  Name
                     21.96%  361.91ms       100  3.6191ms  3.4761ms  4.2035ms  __omp_offloading_fd02_c612a6__ZN9OMPStreamIdE3dotEv_l229
                     12.50%  205.93ms       100  2.0593ms  2.0200ms  2.0720ms  __omp_offloading_fd02_c612a6__ZN9OMPStreamIdE5triadEv_l180
                     12.40%  204.35ms       100  2.0435ms  2.0084ms  2.0561ms  __omp_offloading_fd02_c612a6__ZN9OMPStreamIdE3addEv_l155
                      8.57%  141.31ms       100  1.4131ms  1.3905ms  1.4901ms  __omp_offloading_fd02_c612a6__ZN9OMPStreamIdE3mulEv_l132
                      8.53%  140.61ms       100  1.4061ms  1.3885ms  1.4647ms  __omp_offloading_fd02_c612a6__ZN9OMPStreamIdE4copyEv_l108
                      0.20%  3.2532ms         1  3.2532ms  3.2532ms  3.2532ms  __omp_offloading_fd02_c612a6__ZN9OMPStreamIdE11init_arraysEddd_l62
  Not capping, grid size 262144:
              Type  Time(%)      Time     Calls       Avg       Min       Max  Name
   GPU activities:   34.48%  682.98ms       100  6.8298ms  6.6153ms  7.8655ms  __omp_offloading_fd02_c612a6__ZN9OMPStreamIdE3dotEv_l229
                     10.30%  204.15ms       100  2.0415ms  2.0165ms  2.0479ms  __omp_offloading_fd02_c612a6__ZN9OMPStreamIdE5triadEv_l180
                     10.26%  203.31ms       100  2.0331ms  2.0084ms  2.0385ms  __omp_offloading_fd02_c612a6__ZN9OMPStreamIdE3addEv_l155
                      7.51%  148.83ms       100  1.4883ms  1.4327ms  1.7717ms  __omp_offloading_fd02_c612a6__ZN9OMPStreamIdE3mulEv_l132
                      7.46%  147.83ms       100  1.4783ms  1.4251ms  1.7499ms  __omp_offloading_fd02_c612a6__ZN9OMPStreamIdE4copyEv_l108
                      0.15%  3.0440ms         1  3.0440ms  3.0440ms  3.0440ms  __omp_offloading_fd02_c612a6__ZN9OMPStreamIdE11init_arraysEddd_l62

  rG LLVM Github Monorepo



More information about the Openmp-commits mailing list