[llvm] [AMDGPU] Adding multiple use analysis to SIPeepholeSDWA (PR #94800)

Brian Favela via llvm-commits llvm-commits at lists.llvm.org
Fri Jun 7 13:22:24 PDT 2024


================
@@ -1557,7 +1557,8 @@ define amdgpu_kernel void @mac_v2half(ptr addrspace(1) %out, ptr addrspace(1) %i
 ; GFX89-NEXT:    s_waitcnt vmcnt(1)
 ; GFX89-NEXT:    v_lshrrev_b32_e32 v4, 16, v2
 ; GFX89-NEXT:    s_waitcnt vmcnt(0)
-; GFX89-NEXT:    v_mac_f16_sdwa v4, v3, v2 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:WORD_1
+; GFX89-NEXT:    v_lshrrev_b32_e32 v5, 16, v3
----------------
bfavela wrote:

This was a regrettable regression. The reason is because the SDWA pass sees these as "mul/add" instead of an FMA and the SIPeepholeSDWA phase was not able to optimize it properly. Could be fixed in a future change, however.

https://github.com/llvm/llvm-project/pull/94800


More information about the llvm-commits mailing list