[Openmp-commits] [openmp] [OpenMP][AMDGPU] Adapt dynamic callstack sizes to HIP behavior (PR #74080)
Michael Halkenhäuser via Openmp-commits
openmp-commits at lists.llvm.org
Fri Dec 1 07:10:01 PST 2023
================
@@ -1872,6 +1873,38 @@ struct AMDGPUDeviceTy : public GenericDeviceTy, AMDGenericDeviceTy {
else
return Plugin::error("Unexpected AMDGPU wavefront %d", WavefrontSize);
+ // To determine the correct scratch memory size per thread, we need to check
+ // the device architecure generation. According to AOT_OFFLOADARCHS we may
+ // assume that AMDGPU offload archs are prefixed with "gfx" and suffixed
+ // with a two char arch specialization. In-between is the 1-2 char
+ // generation number we want to extract.
+ std::string CUKind{ComputeUnitKind};
+ for (auto &C : CUKind)
+ C = (char)std::tolower(C);
+
+ int GfxGen = 0;
+ if ((CUKind.find("gfx") == 0) && CUKind.length() > 5 &&
+ CUKind.length() < 8) {
+ // Cut away suffix & prefix.
+ CUKind.erase(CUKind.length() - 2, 2);
+ CUKind.erase(0, 3);
+ // Make sure we only convert digits to a number.
+ if (std::find_if(CUKind.begin(), CUKind.end(), [](unsigned char c) {
+ return !std::isdigit(c);
+ }) == CUKind.end())
+ GfxGen = std::stoi(CUKind);
+ }
+
+ // See: 'getMaxWaveScratchSize' in 'llvm/lib/Target/AMDGPU/GCNSubtarget.h'.
+ // But we need to divide by WavefrontSize.
+ if (GfxGen < 11) {
+ // 13-bit field in units of 256-dword.
+ MaxThreadScratchSize = ((256 * 4) / WavefrontSize) * ((1 << 13) - 1);
+ } else {
+ // 15-bit field in units of 64-dword.
+ MaxThreadScratchSize = ((64 * 4) / WavefrontSize) * ((1 << 15) - 1);
+ }
+
----------------
mhalk wrote:
Thanks & good point, I'll keep this in mind when addressing further feedback.
https://github.com/llvm/llvm-project/pull/74080
More information about the Openmp-commits
mailing list