[llvm] [AMDGPU] Fix computation of waves/EU maximum (PR #140921)

via llvm-commits llvm-commits at lists.llvm.org
Wed May 21 08:58:39 PDT 2025


llvmbot wrote:


<!--LLVM PR SUMMARY COMMENT-->

@llvm/pr-subscribers-backend-amdgpu

Author: Lucas Ramirez (lucas-rami)

<details>
<summary>Changes</summary>

This fixes an issue in the waves/EU range calculation wherein, if the `amdgpu-waves-per-eu` attribute exists and is valid, the entire attribute may be spuriously and completely ignored if workgroup sizes and LDS usage restrict the maximum achievable occupancy below the subtarget maximum. In such cases, we should still honor the requested minimum number of waves/EU, even if the requested maximum is higher than the actually achievable maximum (but still within subtarget specification).

As such, the added unit test `empty_at_least_2_lds_limited`'s waves/EU range should be [2,4] after this patch, when it is currently [1,4] (i.e, as if `amdgpu-waves-per-eu` was not specified at all).

Before e377dc4 the default maximum waves/EU was always set to the subtarget maximum, trivially avoiding the issue.

---
Full diff: https://github.com/llvm/llvm-project/pull/140921.diff


2 Files Affected:

- (modified) llvm/lib/Target/AMDGPU/AMDGPUSubtarget.cpp (+11-7) 
- (modified) llvm/test/CodeGen/AMDGPU/attr-amdgpu-waves-per-eu.ll (+12) 


``````````diff
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUSubtarget.cpp b/llvm/lib/Target/AMDGPU/AMDGPUSubtarget.cpp
index 776cc6258dbcd..2131625959827 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUSubtarget.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUSubtarget.cpp
@@ -191,17 +191,21 @@ std::pair<unsigned, unsigned> AMDGPUSubtarget::getEffectiveWavesPerEU(
       getOccupancyWithWorkGroupSizes(LDSBytes, FlatWorkGroupSizes).second};
   Default.first = std::min(Default.first, Default.second);
 
-  // Make sure requested minimum is less than requested maximum.
-  if (RequestedWavesPerEU.second &&
-      RequestedWavesPerEU.first > RequestedWavesPerEU.second)
+  // Make sure requested min is within the default range.
+  if (RequestedWavesPerEU.first < Default.first ||
+      RequestedWavesPerEU.first > Default.second)
     return Default;
 
-  // Make sure requested values do not violate subtarget's specifications and
-  // are compatible with values implied by minimum/maximum flat workgroup sizes.
-  if (RequestedWavesPerEU.first < Default.first ||
-      RequestedWavesPerEU.second > Default.second)
+  // When provided, make sure requested max is higher than min and does not
+  // violate target specification.
+  if (RequestedWavesPerEU.second &&
+      (RequestedWavesPerEU.first > RequestedWavesPerEU.second ||
+       RequestedWavesPerEU.second > getMaxWavesPerEU()))
     return Default;
 
+  // We cannot exceed maximum occupancy implied by flat workgroup size and LDS.
+  RequestedWavesPerEU.second =
+      std::min(RequestedWavesPerEU.second, Default.second);
   return RequestedWavesPerEU;
 }
 
diff --git a/llvm/test/CodeGen/AMDGPU/attr-amdgpu-waves-per-eu.ll b/llvm/test/CodeGen/AMDGPU/attr-amdgpu-waves-per-eu.ll
index 4507fd5865989..eff424ae02c81 100644
--- a/llvm/test/CodeGen/AMDGPU/attr-amdgpu-waves-per-eu.ll
+++ b/llvm/test/CodeGen/AMDGPU/attr-amdgpu-waves-per-eu.ll
@@ -200,3 +200,15 @@ entry:
   ret void
 }
 attributes #10 = {"amdgpu-flat-work-group-size"="256,256" "amdgpu-waves-per-eu"="2,2"}
+
+; Minimum 2 waves, maximum limited by LDS usage.
+; CHECK-LABEL: {{^}}empty_at_least_2_lds_limited:
+; CHECK: SGPRBlocks: 12
+; CHECK: VGPRBlocks: 12
+; CHECK: NumSGPRsForWavesPerEU: 102
+; CHECK: NumVGPRsForWavesPerEU: 49
+define amdgpu_kernel void @empty_at_least_2_lds_limited() #11 {
+entry:
+  ret void
+}
+attributes #11 = {"amdgpu-flat-work-group-size"="1,256" "amdgpu-waves-per-eu"="2" "amdgpu-lds-size"="16384"}

``````````

</details>


https://github.com/llvm/llvm-project/pull/140921


More information about the llvm-commits mailing list