[llvm] [AMDGPU] Select v_lshl_add_u32 instead of v_mul_lo_u32 by constant (PR #71035)
Matt Arsenault via llvm-commits
llvm-commits at lists.llvm.org
Thu Nov 2 01:28:26 PDT 2023
================
@@ -1194,7 +1206,7 @@ define amdgpu_kernel void @sub_i32_constant(ptr addrspace(1) %out, ptr addrspace
; GFX10W64-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24
; GFX10W64-NEXT: s_waitcnt vmcnt(0)
; GFX10W64-NEXT: v_readfirstlane_b32 s2, v1
-; GFX10W64-NEXT: v_mul_u32_u24_e32 v0, 5, v0
+; GFX10W64-NEXT: v_lshl_add_u32 v0, v0, 2, v0
----------------
arsenm wrote:
This is worse, 24-bit instructions are fast and have a smaller encoding
https://github.com/llvm/llvm-project/pull/71035
More information about the llvm-commits
mailing list