[llvm] [AMDGPU] Optimize rotate/funnel shift pattern matching in instruction selection (PR #149817)
Juan Manuel Martinez CaamaƱo via llvm-commits
llvm-commits at lists.llvm.org
Thu Jul 31 03:05:21 PDT 2025
================
@@ -210,19 +247,27 @@ define amdgpu_kernel void @rotl_v4i32(ptr addrspace(1) %in, <4 x i32> %x, <4 x i
; GFX8-NEXT: s_load_dwordx8 s[8:15], s[4:5], 0x34
; GFX8-NEXT: s_load_dwordx2 s[0:1], s[4:5], 0x24
; GFX8-NEXT: s_waitcnt lgkmcnt(0)
-; GFX8-NEXT: s_sub_i32 s5, 32, s15
-; GFX8-NEXT: s_sub_i32 s4, 32, s14
-; GFX8-NEXT: v_mov_b32_e32 v0, s5
-; GFX8-NEXT: s_sub_i32 s3, 32, s13
-; GFX8-NEXT: v_alignbit_b32 v3, s11, s11, v0
-; GFX8-NEXT: v_mov_b32_e32 v0, s4
-; GFX8-NEXT: s_sub_i32 s2, 32, s12
-; GFX8-NEXT: v_alignbit_b32 v2, s10, s10, v0
-; GFX8-NEXT: v_mov_b32_e32 v0, s3
-; GFX8-NEXT: v_alignbit_b32 v1, s9, s9, v0
-; GFX8-NEXT: v_mov_b32_e32 v0, s2
+; GFX8-NEXT: s_lshl_b32 s2, s8, s12
----------------
jmmartinez wrote:
It looks like the lowering being proposed here generates more code than before. Is this expected / desired ?
Can we try to match the before code ?
https://github.com/llvm/llvm-project/pull/149817
More information about the llvm-commits
mailing list