[Openmp-commits] [PATCH] D158802: [OpenMP] Honor `thread_limit` value when choosing grid size

Shilei Tian via Phabricator via Openmp-commits openmp-commits at lists.llvm.org
Thu Aug 24 18:14:33 PDT 2023


tianshilei1992 created this revision.
tianshilei1992 added reviewers: jdoerfert, jhuber6.
Herald added subscribers: guansong, yaxunl.
Herald added a project: All.
tianshilei1992 requested review of this revision.
Herald added subscribers: openmp-commits, jplehr, sstefan1.
Herald added a project: OpenMP.

D152014 <https://reviews.llvm.org/D152014> introduced an optimization that favors more smaller blocks over
fewer larger blocks, even if user sets `thread_limit` explicitly. This patch changes
the behavior to honor user value.


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D158802

Files:
  openmp/libomptarget/plugins-nextgen/common/PluginInterface/PluginInterface.cpp
  openmp/libomptarget/plugins-nextgen/common/PluginInterface/PluginInterface.h


Index: openmp/libomptarget/plugins-nextgen/common/PluginInterface/PluginInterface.h
===================================================================
--- openmp/libomptarget/plugins-nextgen/common/PluginInterface/PluginInterface.h
+++ openmp/libomptarget/plugins-nextgen/common/PluginInterface/PluginInterface.h
@@ -335,9 +335,11 @@
                          uint32_t ThreadLimitClause[3]) const;
 
   /// The number of threads \p NumThreads can be adjusted by this method.
+  /// \p IsNumThreadsFromUser is true is \p NumThreads is defined by user via
+  /// thread_limit clause.
   uint64_t getNumBlocks(GenericDeviceTy &GenericDevice,
                         uint32_t BlockLimitClause[3], uint64_t LoopTripCount,
-                        uint32_t &NumThreads) const;
+                        uint32_t &NumThreads, bool IsNumThreadsFromUser) const;
 
   /// Indicate if the kernel works in Generic SPMD, Generic or SPMD mode.
   bool isGenericSPMDMode() const {
Index: openmp/libomptarget/plugins-nextgen/common/PluginInterface/PluginInterface.cpp
===================================================================
--- openmp/libomptarget/plugins-nextgen/common/PluginInterface/PluginInterface.cpp
+++ openmp/libomptarget/plugins-nextgen/common/PluginInterface/PluginInterface.cpp
@@ -327,8 +327,9 @@
                                     KernelArgs.NumArgs, Args, Ptrs);
 
   uint32_t NumThreads = getNumThreads(GenericDevice, KernelArgs.ThreadLimit);
-  uint64_t NumBlocks = getNumBlocks(GenericDevice, KernelArgs.NumTeams,
-                                    KernelArgs.Tripcount, NumThreads);
+  uint64_t NumBlocks =
+      getNumBlocks(GenericDevice, KernelArgs.NumTeams, KernelArgs.Tripcount,
+                   NumThreads, NumThreads == KernelArgs.ThreadLimit[0]);
 
   if (auto Err =
           printLaunchInfo(GenericDevice, KernelArgs, NumThreads, NumBlocks))
@@ -371,7 +372,8 @@
 uint64_t GenericKernelTy::getNumBlocks(GenericDeviceTy &GenericDevice,
                                        uint32_t NumTeamsClause[3],
                                        uint64_t LoopTripCount,
-                                       uint32_t &NumThreads) const {
+                                       uint32_t &NumThreads,
+                                       bool IsNumThreadsFromUser) const {
   assert(NumTeamsClause[1] == 0 && NumTeamsClause[2] == 0 &&
          "Multi dimensional launch not supported yet.");
 
@@ -412,13 +414,17 @@
         auto NumThreadsDefaultBlocksP2 =
             llvm::PowerOf2Ceil(NumThreadsDefaultBlocks);
         // Do not increase a thread limit given be the user.
-        NumThreads = std::min(NumThreads, uint32_t(NumThreadsDefaultBlocksP2));
+        if (!IsNumThreadsFromUser)
+          NumThreads =
+              std::min(NumThreads, uint32_t(NumThreadsDefaultBlocksP2));
         assert(NumThreads >= MinThreads &&
                "Expected sufficient inner parallelism.");
         TripCountNumBlocks = ((LoopTripCount - 1) / NumThreads) + 1;
       } else {
-        // Not enough parallelism for teams and threads, limit both.
-        NumThreads = std::min(NumThreads, MinThreads);
+        // Not enough parallelism for teams and threads, limit both NumThreads
+        // value is not from user.
+        if (!IsNumThreadsFromUser)
+          NumThreads = std::min(NumThreads, MinThreads);
         TripCountNumBlocks = ((LoopTripCount - 1) / NumThreads) + 1;
       }
 


-------------- next part --------------
A non-text attachment was scrubbed...
Name: D158802.553322.patch
Type: text/x-patch
Size: 3435 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/openmp-commits/attachments/20230825/614e79dd/attachment-0001.bin>


More information about the Openmp-commits mailing list