[clang] [clang][CodeGen][SPIRV] Translate `amdgpu_flat_work_group_size` into `reqd_work_group_size`. (PR #116820)

Fri Dec 20 01:38:45 PST 2024

================
@@ -245,6 +247,41 @@ SPIRVTargetCodeGenInfo::getGlobalVarAddressSpace(CodeGenModule &CGM,
   return DefaultGlobalAS;
 }
 
+void SPIRVTargetCodeGenInfo::setTargetAttributes(
+    const Decl *D, llvm::GlobalValue *GV, CodeGen::CodeGenModule &M) const {
+  if (!M.getLangOpts().HIP ||
+      M.getTarget().getTriple().getVendor() != llvm::Triple::AMD)
+    return;
+  if (GV->isDeclaration())
+    return;
+
+  auto F = dyn_cast<llvm::Function>(GV);
+  if (!F)
+    return;
+
+  auto FD = dyn_cast_or_null<FunctionDecl>(D);
+  if (!FD)
+    return;
+  if (!FD->hasAttr<CUDAGlobalAttr>())
+    return;
+
+  unsigned N = M.getLangOpts().GPUMaxThreadsPerBlock;
+  if (auto FlatWGS = FD->getAttr<AMDGPUFlatWorkGroupSizeAttr>())
+    N = FlatWGS->getMax()->EvaluateKnownConstInt(M.getContext()).getExtValue();
+
+  // We encode the maximum flat WG size in the first component of the 3D
+  // max_work_group_size attribute, which will get reverse translated into the
+  // original AMDGPU attribute when targeting AMDGPU.
----------------
AlexVlx wrote:

We are talking across eachother. I am saying that the SPIR-V attribute cannot be generated via Clang, i.e. that you cannot write `__attribute__((foo))` in your source and obtain `max_work_group_size` metadata, at the moment. Furthermore, from the implementation of Clang's `__launch_bounds__`:
 
```cpp
// An AST node is created for this attribute, but is not used by other parts
// of the compiler. However, this node needs to exist in the AST because
// non-LLVM backends may be relying on the attribute's presence.
```

So this is a glorified annotation / we'd still have to decide on how to lower it into IR, which would likely end up atop flat workgroup size, unless we choose to spam yet another attribute. We also use flat workgroup size implicitly to control / implement `--gpu-max-threads-per-block`, which is important for correctness, and is in a fairly similar place with `__launch_bounds__` (it's always 1D, doesn't have a minimum etc.). It's also not handled by this patch, so I'll have to add it:)

That being said, the idea in #91468 is sound, but it will require a bit of work to get done; I think we'd still have to choose a way to pass the info through SPIR-V (what this PR tries to do).

https://github.com/llvm/llvm-project/pull/116820