[llvm] e6f9483 - [SelectionDAG] Flags are dropped when creating a new FMUL (#66701)

Thu Sep 21 08:26:40 PDT 2023

Author: Sirish Pande
Date: 2023-09-21T10:26:34-05:00
New Revision: e6f9483f77dbbdfdc010f8db2bbb0e236820eddb

URL: https://github.com/llvm/llvm-project/commit/e6f9483f77dbbdfdc010f8db2bbb0e236820eddb
DIFF: https://github.com/llvm/llvm-project/commit/e6f9483f77dbbdfdc010f8db2bbb0e236820eddb.diff

LOG: [SelectionDAG] Flags are dropped when creating a new FMUL (#66701)

While simplifying some vector operators in DAG combine, we may need to
create new instructions for simplified vectors. At that time, we need to
make sure that all the flags of the new instruction are copied/modified
from the old instruction.

If "contract" is dropped from an instruction like FMUL, it may not
generate FMA instruction which would impact performance.

Here's an example where "contract" flag is dropped when FMUL is created.

Replacing.2 t42: v2f32 = fmul contract t41, t38
With: t48: v2f32 = fmul t38, t38

Co-authored-by: Sirish Pande <sirish.pande at amd.com>

Added: 
    

Modified: 
    llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
    llvm/test/CodeGen/AMDGPU/fma.ll

Removed: 
    


################################################################################
diff  --git a/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp b/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
index aa367166e2a359e..39489e0bf142eb2 100644

--- a/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
@@ -2990,8 +2990,9 @@ bool TargetLowering::SimplifyDemandedVectorElts(
     SDValue NewOp1 = SimplifyMultipleUseDemandedVectorElts(Op1, DemandedElts,
                                                            TLO.DAG, Depth + 1);
     if (NewOp0 || NewOp1) {
-      SDValue NewOp = TLO.DAG.getNode(
-          Opcode, SDLoc(Op), VT, NewOp0 ? NewOp0 : Op0, NewOp1 ? NewOp1 : Op1);
+      SDValue NewOp =
+          TLO.DAG.getNode(Opcode, SDLoc(Op), VT, NewOp0 ? NewOp0 : Op0,
+                          NewOp1 ? NewOp1 : Op1, Op->getFlags());
       return TLO.CombineTo(Op, NewOp);
     }
     return false;

diff  --git a/llvm/test/CodeGen/AMDGPU/fma.ll b/llvm/test/CodeGen/AMDGPU/fma.ll
index 0f8560c1d7628a5..19bd5b8e62446f0 100644
--- a/llvm/test/CodeGen/AMDGPU/fma.ll
+++ b/llvm/test/CodeGen/AMDGPU/fma.ll
@@ -159,15 +159,14 @@ define float @fold_fmul_distributive(float %x, float %y) {
 define amdgpu_kernel void @vec_mul_scalar_add_fma(<2 x float> %a, <2 x float> %b, float %c1, ptr addrspace(1) %inptr) {
 ; GFX906-LABEL: vec_mul_scalar_add_fma:
 ; GFX906:       ; %bb.0:
+; GFX906-NEXT:    s_load_dword s8, s[0:1], 0x34
 ; GFX906-NEXT:    s_load_dwordx4 s[4:7], s[0:1], 0x24
-; GFX906-NEXT:    s_waitcnt lgkmcnt(0)
-; GFX906-NEXT:    s_load_dword s5, s[0:1], 0x34
 ; GFX906-NEXT:    s_load_dwordx2 s[2:3], s[0:1], 0x3c
 ; GFX906-NEXT:    v_mov_b32_e32 v0, 0
-; GFX906-NEXT:    v_mov_b32_e32 v1, s6
-; GFX906-NEXT:    v_mul_f32_e32 v1, s4, v1
 ; GFX906-NEXT:    s_waitcnt lgkmcnt(0)
-; GFX906-NEXT:    v_add_f32_e32 v1, s5, v1
+; GFX906-NEXT:    v_mov_b32_e32 v1, s8
+; GFX906-NEXT:    v_mov_b32_e32 v2, s6
+; GFX906-NEXT:    v_fmac_f32_e32 v1, s4, v2
 ; GFX906-NEXT:    global_store_dword v0, v1, s[2:3] offset:4
 ; GFX906-NEXT:    s_endpgm
   %gep = getelementptr float, ptr addrspace(1) %inptr, i32 1