[Openmp-commits] [openmp] aca33b0 - [OpenMP][CUDA] Remove the hard team limit

Shilei Tian via Openmp-commits openmp-commits at lists.llvm.org
Thu Feb 10 15:07:51 PST 2022


Author: Shilei Tian
Date: 2022-02-10T18:07:46-05:00
New Revision: aca33b0b37b706a013625c92c4713b3a329d90d0

URL: https://github.com/llvm/llvm-project/commit/aca33b0b37b706a013625c92c4713b3a329d90d0
DIFF: https://github.com/llvm/llvm-project/commit/aca33b0b37b706a013625c92c4713b3a329d90d0.diff

LOG: [OpenMP][CUDA] Remove the hard team limit

Currently we have a hard team limit, which is set to 65536. It says no matter whether the device can support more teams, or users set more teams, as long as it is larger than that hard limit, the final number to launch the kernel will always be that hard limit. It is way less than the actual hardware limit. For example, my workstation has GTX2080, and the hardware limit of grid size is 2147483647, which is exactly the largest number a `int32_t` can represent. There is no limitation mentioned in the spec. This patch simply removes it.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D119313

Added: 
    

Modified: 
    openmp/libomptarget/plugins/cuda/src/rtl.cpp

Removed: 
    


################################################################################
diff  --git a/openmp/libomptarget/plugins/cuda/src/rtl.cpp b/openmp/libomptarget/plugins/cuda/src/rtl.cpp
index 0ca05f0ec3a0f..b688fe11ef5a7 100644
--- a/openmp/libomptarget/plugins/cuda/src/rtl.cpp
+++ b/openmp/libomptarget/plugins/cuda/src/rtl.cpp
@@ -327,10 +327,9 @@ class DeviceRTLTy {
   // Number of initial streams for each device.
   int NumInitialStreams = 32;
 
-  static constexpr const int HardTeamLimit = 1U << 16U; // 64k
-  static constexpr const int HardThreadLimit = 1024;
-  static constexpr const int DefaultNumTeams = 128;
-  static constexpr const int DefaultNumThreads = 128;
+  static constexpr const int32_t HardThreadLimit = 1024;
+  static constexpr const int32_t DefaultNumTeams = 128;
+  static constexpr const int32_t DefaultNumThreads = 128;
 
   using StreamPoolTy = ResourcePoolTy<CUstream>;
   std::vector<std::unique_ptr<StreamPoolTy>> StreamPool;
@@ -651,14 +650,9 @@ class DeviceRTLTy {
       DP("Error getting max grid dimension, use default value %d\n",
          DeviceRTLTy::DefaultNumTeams);
       DeviceData[DeviceId].BlocksPerGrid = DeviceRTLTy::DefaultNumTeams;
-    } else if (MaxGridDimX <= DeviceRTLTy::HardTeamLimit) {
+    } else {
       DP("Using %d CUDA blocks per grid\n", MaxGridDimX);
       DeviceData[DeviceId].BlocksPerGrid = MaxGridDimX;
-    } else {
-      DP("Max CUDA blocks per grid %d exceeds the hard team limit %d, capping "
-         "at the hard limit\n",
-         MaxGridDimX, DeviceRTLTy::HardTeamLimit);
-      DeviceData[DeviceId].BlocksPerGrid = DeviceRTLTy::HardTeamLimit;
     }
 
     // We are only exploiting threads along the x axis.


        


More information about the Openmp-commits mailing list