[clang] [llvm] [OpenMP] Remove 'minncta' attributes from NVPTX kernels (PR #88398)

Thu Apr 11 08:17:10 PDT 2024

https://github.com/jhuber6 created https://github.com/llvm/llvm-project/pull/88398

Summary:
Currently we treat this attribute as a minimum number for the amount of
blocks scheduled on the kernel. However, the doucmentation states that
this applies to CTA's mapped onto a *single* SM. Currently we just set
it to the total number of blocks, which will almost always result in a
warning that the value is out of range and will be ignored. We don't
have a good way to automatically know how many CTAs can be put on a
single SM nor if we should do this, so we should probably leave this up
to users manually adding it.

https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#performance-tuning-directives-minnctapersm


>From b1229aeb26c67b73bf3af6449f7f4eb9d0ad2c6a Mon Sep 17 00:00:00 2001
From: Joseph Huber <huberjn at outlook.com>
Date: Thu, 11 Apr 2024 10:14:00 -0500
Subject: [PATCH] [OpenMP] Remove 'minncta' attributes from NVPTX kernels

Summary:
Currently we treat this attribute as a minimum number for the amount of
blocks scheduled on the kernel. However, the doucmentation states that
this applies to CTA's mapped onto a *single* SM. Currently we just set
it to the total number of blocks, which will almost always result in a
warning that the value is out of range and will be ignored. We don't
have a good way to automatically know how many CTAs can be put on a
single SM nor if we should do this, so we should probably leave this up
to users manually adding it.

https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#performance-tuning-directives-minnctapersm
---
 clang/test/OpenMP/ompx_attributes_codegen.cpp | 3 +--
 llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp     | 4 +---
 2 files changed, 2 insertions(+), 5 deletions(-)

diff --git a/clang/test/OpenMP/ompx_attributes_codegen.cpp b/clang/test/OpenMP/ompx_attributes_codegen.cpp
index 6735972c6b1070..87eb2913537ba5 100644
--- a/clang/test/OpenMP/ompx_attributes_codegen.cpp
+++ b/clang/test/OpenMP/ompx_attributes_codegen.cpp
@@ -36,6 +36,5 @@ void func() {
 // NVIDIA: "omp_target_thread_limit"="45"
 // NVIDIA: "omp_target_thread_limit"="17"
 // NVIDIA: !{ptr @__omp_offloading[[HASH1:.*]]_l16, !"maxntidx", i32 20}
-// NVIDIA: !{ptr @__omp_offloading[[HASH2:.*]]_l18, !"minctasm", i32 90}
-// NVIDIA: !{ptr @__omp_offloading[[HASH2]]_l18, !"maxntidx", i32 45}
+// NVIDIA: !{ptr @__omp_offloading[[HASH2:.*]]_l18, !"maxntidx", i32 45}
 // NVIDIA: !{ptr @__omp_offloading[[HASH3:.*]]_l20, !"maxntidx", i32 17}
diff --git a/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp b/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
index 7fd8474c2ec890..4d2d352f7520b2 100644
--- a/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
+++ b/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
@@ -4786,11 +4786,9 @@ OpenMPIRBuilder::readTeamBoundsForKernel(const Triple &, Function &Kernel) {
 
 void OpenMPIRBuilder::writeTeamsForKernel(const Triple &T, Function &Kernel,
                                           int32_t LB, int32_t UB) {
-  if (T.isNVPTX()) {
+  if (T.isNVPTX())
     if (UB > 0)
       updateNVPTXMetadata(Kernel, "maxclusterrank", UB, true);
-    updateNVPTXMetadata(Kernel, "minctasm", LB, false);
-  }
   if (T.isAMDGPU())
     Kernel.addFnAttr("amdgpu-max-num-workgroups", llvm::utostr(LB) + ",1,1");