[llvm] [AMDGPU] Adding multiple use analysis to SIPeepholeSDWA (PR #94800)
Brian Favela via llvm-commits
llvm-commits at lists.llvm.org
Fri Jun 7 13:22:24 PDT 2024
================
@@ -1557,7 +1557,8 @@ define amdgpu_kernel void @mac_v2half(ptr addrspace(1) %out, ptr addrspace(1) %i
; GFX89-NEXT: s_waitcnt vmcnt(1)
; GFX89-NEXT: v_lshrrev_b32_e32 v4, 16, v2
; GFX89-NEXT: s_waitcnt vmcnt(0)
-; GFX89-NEXT: v_mac_f16_sdwa v4, v3, v2 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:WORD_1
+; GFX89-NEXT: v_lshrrev_b32_e32 v5, 16, v3
----------------
bfavela wrote:
This was a regrettable regression. The reason is because the SDWA pass sees these as "mul/add" instead of an FMA and the SIPeepholeSDWA phase was not able to optimize it properly. Could be fixed in a future change, however.
https://github.com/llvm/llvm-project/pull/94800
More information about the llvm-commits
mailing list