[clang] [llvm] [mlir] [AMDGPU] Verify AMDGPU required workgroup size matches flat workgroup size (PR #200989)

Mon Jun 8 05:15:09 PDT 2026

================
@@ -51,6 +51,9 @@ __attribute__((amdgpu_waves_per_eu(0, 4))) kernel void kernel_waves_per_eu_0_4()
 __attribute__((amdgpu_flat_work_group_size(64, 32))) kernel void kernel_flat_work_group_size_64_32() {} // expected-error {{'amdgpu_flat_work_group_size' attribute argument is invalid: min must not be greater than max}}
 __attribute__((amdgpu_waves_per_eu(4, 2))) kernel void kernel_waves_per_eu_4_2() {} // expected-error {{'amdgpu_waves_per_eu' attribute argument is invalid: min must not be greater than max}}
 
+__attribute__((reqd_work_group_size(32, 2, 1), amdgpu_flat_work_group_size(16, 128))) kernel void kernel_reqd_work_group_size_32_2_1_flat_work_group_size_16_128() {} // expected-error {{'amdgpu_flat_work_group_size' attribute must match 'reqd_work_group_size' product}}
+__attribute__((amdgpu_flat_work_group_size(16, 128), reqd_work_group_size(32, 2, 1))) kernel void kernel_flat_work_group_size_16_128_reqd_work_group_size_32_2_1() {} // expected-error {{'amdgpu_flat_work_group_size' attribute must match 'reqd_work_group_size' product}}
+
----------------
steffenlarsen wrote:

Would be great to also see a .cpp or .cu test for this new diagnostic. In particular, it would be good to test that dependent functions can also detect the mismatch. Maybe something like:
```c++
template <unsigned a, unsigned b, unsigned c, unsigned d, unsigned e>
__attribute__((reqd_work_group_size(a, b, c), amdgpu_flat_work_group_size(d, e))) __global__ void f() {}

template __global__ void f<32, 2, 1, 16, 128>(void);
```
and some variants thereof?

https://github.com/llvm/llvm-project/pull/200989