[llvm] [AMDGPU] Correct bitshift legality transformation for small vectors (PR #140940)

Wed May 21 10:50:46 PDT 2025

llvmbot wrote:




@llvm/pr-subscribers-backend-amdgpu

Author: None (zGoldthorpe)

<details>
<summary>Changes</summary>

Fix for a bug found by the AMD fuzzing project.

The legaliser would originally try to widen a small vector such as `<4 x i1>` to a single `i16` during the legalisation of bitshifts, as it was not originally written with consideration for vector operands. This patch simply adds a guard to prohibit this transformation and allow other legalisation transformations to step in.

---
Full diff: https://github.com/llvm/llvm-project/pull/140940.diff


2 Files Affected:

- (modified) llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp (+1-1) 
- (added) llvm/test/CodeGen/AMDGPU/widen-vector-shift.ll (+24) 


``````````diff

diff --git a/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp b/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
index 667c466a998e0..eeb05f0acebed 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
@@ -1765,7 +1765,7 @@ AMDGPULegalizerInfo::AMDGPULegalizerInfo(const GCNSubtarget &ST_,
         // 32-bit amount.
         const LLT ValTy = Query.Types[0];
         const LLT AmountTy = Query.Types[1];
-        return ValTy.getSizeInBits() <= 16 &&
+        return ValTy.isScalar() && ValTy.getSizeInBits() <= 16 &&
                AmountTy.getSizeInBits() < 16;
       }, changeTo(1, S16));
     Shifts.maxScalarIf(typeIs(0, S16), 1, S16);
diff --git a/llvm/test/CodeGen/AMDGPU/widen-vector-shift.ll b/llvm/test/CodeGen/AMDGPU/widen-vector-shift.ll
new file mode 100644
index 0000000000000..1d40038abe911
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/widen-vector-shift.ll
@@ -0,0 +1,24 @@
+; RUN: llc -global-isel -mtriple=amdgcn -mcpu=gfx90a -O0 -print-after=legalizer %s -o /dev/null 2>&1 | FileCheck %s
+
+; CHECK-LABEL: widen_ashr_i4:
+define amdgpu_kernel void @widen_ashr_i4(
+    ptr addrspace(1) %res, i4 %a, i4 %b) {
+; CHECK: G_ASHR %{{[0-9]+}}:_, %{{[0-9]+}}:_(s16)
+entry:
+  %res.val = ashr i4 %a, %b
+  store i4 %res.val, ptr addrspace(1) %res
+  ret void
+}
+
+; CHECK-LABEL: widen_ashr_v4i1:
+define amdgpu_kernel void @widen_ashr_v4i1(
+    ptr addrspace(1) %res, <4 x i1> %a, <4 x i1> %b) {
+; CHECK: G_ASHR %{{[0-9]+}}:_, %{{[0-9]+}}:_(s16)
+; CHECK: G_ASHR %{{[0-9]+}}:_, %{{[0-9]+}}:_(s16)
+; CHECK: G_ASHR %{{[0-9]+}}:_, %{{[0-9]+}}:_(s16)
+; CHECK: G_ASHR %{{[0-9]+}}:_, %{{[0-9]+}}:_(s16)
+entry:
+  %res.val = ashr <4 x i1> %a, %b
+  store <4 x i1> %res.val, ptr addrspace(1) %res
+  ret void
+}

``````````

</details>


https://github.com/llvm/llvm-project/pull/140940