[Openmp-commits] [openmp] [OpenMP][AMDGPU] Adapt dynamic callstack sizes to HIP behavior (PR #74080)

Fri Dec 1 07:10:01 PST 2023

================
@@ -1872,6 +1873,38 @@ struct AMDGPUDeviceTy : public GenericDeviceTy, AMDGenericDeviceTy {
     else
       return Plugin::error("Unexpected AMDGPU wavefront %d", WavefrontSize);
 
+    // To determine the correct scratch memory size per thread, we need to check
+    // the device architecure generation. According to AOT_OFFLOADARCHS we may
+    // assume that AMDGPU offload archs are prefixed with "gfx" and suffixed
+    // with a two char arch specialization. In-between is the 1-2 char
+    // generation number we want to extract.
+    std::string CUKind{ComputeUnitKind};
+    for (auto &C : CUKind)
+      C = (char)std::tolower(C);
+
+    int GfxGen = 0;
+    if ((CUKind.find("gfx") == 0) && CUKind.length() > 5 &&
+        CUKind.length() < 8) {
+      // Cut away suffix & prefix.
+      CUKind.erase(CUKind.length() - 2, 2);
+      CUKind.erase(0, 3);
+      // Make sure we only convert digits to a number.
+      if (std::find_if(CUKind.begin(), CUKind.end(), [](unsigned char c) {
+            return !std::isdigit(c);
+          }) == CUKind.end())
+        GfxGen = std::stoi(CUKind);
+    }
+
+    // See: 'getMaxWaveScratchSize' in 'llvm/lib/Target/AMDGPU/GCNSubtarget.h'.
+    // But we need to divide by WavefrontSize.
+    if (GfxGen < 11) {
+      // 13-bit field in units of 256-dword.
+      MaxThreadScratchSize = ((256 * 4) / WavefrontSize) * ((1 << 13) - 1);
+    } else {
+      // 15-bit field in units of 64-dword.
+      MaxThreadScratchSize = ((64 * 4) / WavefrontSize) * ((1 << 15) - 1);
+    }
+
----------------
mhalk wrote:

Thanks & good point, I'll keep this in mind when addressing further feedback.

https://github.com/llvm/llvm-project/pull/74080