[llvm] [AMDGPU] Add pattern to select scalar ops for fshr with uniform operands (PR #165295)
Jay Foad via llvm-commits
llvm-commits at lists.llvm.org
Fri Oct 31 09:12:41 PDT 2025
================
@@ -195,23 +201,47 @@ define amdgpu_kernel void @fshr_i32_imm(ptr addrspace(1) %in, i32 %x, i32 %y) {
; GFX10-NEXT: global_store_dword v0, v1, s[0:1]
; GFX10-NEXT: s_endpgm
;
-; GFX11-LABEL: fshr_i32_imm:
-; GFX11: ; %bb.0: ; %entry
-; GFX11-NEXT: s_load_b128 s[0:3], s[4:5], 0x24
-; GFX11-NEXT: v_mov_b32_e32 v0, 0
-; GFX11-NEXT: s_waitcnt lgkmcnt(0)
-; GFX11-NEXT: v_alignbit_b32 v1, s2, s3, 7
-; GFX11-NEXT: global_store_b32 v0, v1, s[0:1]
-; GFX11-NEXT: s_endpgm
-;
-; GFX12-LABEL: fshr_i32_imm:
-; GFX12: ; %bb.0: ; %entry
-; GFX12-NEXT: s_load_b128 s[0:3], s[4:5], 0x24
-; GFX12-NEXT: v_mov_b32_e32 v0, 0
-; GFX12-NEXT: s_wait_kmcnt 0x0
-; GFX12-NEXT: v_alignbit_b32 v1, s2, s3, 7
-; GFX12-NEXT: global_store_b32 v0, v1, s[0:1]
-; GFX12-NEXT: s_endpgm
+; GFX11-TRUE16-LABEL: fshr_i32_imm:
+; GFX11-TRUE16: ; %bb.0: ; %entry
+; GFX11-TRUE16-NEXT: s_load_b128 s[0:3], s[4:5], 0x24
+; GFX11-TRUE16-NEXT: v_mov_b32_e32 v0, 0
+; GFX11-TRUE16-NEXT: s_waitcnt lgkmcnt(0)
+; GFX11-TRUE16-NEXT: v_alignbit_b32 v1, s2, s3, 7
+; GFX11-TRUE16-NEXT: global_store_b32 v0, v1, s[0:1]
+; GFX11-TRUE16-NEXT: s_endpgm
+;
+; GFX11-FAKE16-LABEL: fshr_i32_imm:
+; GFX11-FAKE16: ; %bb.0: ; %entry
+; GFX11-FAKE16-NEXT: s_load_b128 s[0:3], s[4:5], 0x24
+; GFX11-FAKE16-NEXT: s_waitcnt lgkmcnt(0)
+; GFX11-FAKE16-NEXT: s_mov_b32 s4, s3
+; GFX11-FAKE16-NEXT: s_mov_b32 s5, s2
+; GFX11-FAKE16-NEXT: s_delay_alu instid0(SALU_CYCLE_1) | instskip(NEXT) | instid1(SALU_CYCLE_1)
+; GFX11-FAKE16-NEXT: s_lshr_b64 s[2:3], s[4:5], 7
----------------
jayfoad wrote:
I guess SIFoldOperands managed to fold away the s_and_b32 you generated here? It would still be better not to generate it in the first place, when the shift amount is constant.
https://github.com/llvm/llvm-project/pull/165295
More information about the llvm-commits
mailing list