[llvm] [X86][ISel][FMA] Get a handle on operand nodes when negating FMA (PR #130176)

Vineet Kumar via llvm-commits llvm-commits at lists.llvm.org
Wed Mar 12 14:29:47 PDT 2025


================
@@ -0,0 +1,25 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
+; RUN: llc < %s -mtriple=x86_64-unknown-linux-gnu -mattr=+avx512f -mattr=+fma | FileCheck %s
+
+define void @fma_neg(<8 x i1> %r280, ptr %pp1, ptr %pp2)  {
+; CHECK-LABEL: fma_neg:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vpmovsxwq %xmm0, %zmm0
+; CHECK-NEXT:    vpsllq $63, %zmm0, %zmm0
+; CHECK-NEXT:    vptestmq %zmm0, %zmm0, %k1
+; CHECK-NEXT:    vmovdqu64 (%rdi), %zmm0
+; CHECK-NEXT:    vpxorq {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to8}, %zmm0, %zmm1 {%k1} {z}
+; CHECK-NEXT:    vxorpd %xmm2, %xmm2, %xmm2
+; CHECK-NEXT:    vfnmadd213pd {{.*#+}} zmm2 = -(zmm0 * zmm2) + zmm1
+; CHECK-NEXT:    vmovupd %zmm2, (%rsi)
+; CHECK-NEXT:    vzeroupper
+; CHECK-NEXT:    retq
+  %r290 = load <8 x double>, ptr %pp1, align 8
+  %r307 = fneg <8 x double> %r290
+  %r309 = select <8 x i1> %r280, <8 x double> %r307, <8 x double> zeroinitializer
+  %r311 = tail call <8 x double> @llvm.x86.avx512.vfmadd.pd.512(<8 x double> %r307, <8 x double> zeroinitializer, <8 x double> %r309, i32 4)
----------------
vntkmr wrote:

Using generic `@llvm.fma.v8f64` intrinsic, I am unable to reproduce the problem.

With `@llvm.x86.avx512.vfmadd.pd.512`, the DAG at the entry of `combineFMA` function in `X86IselLowering.cpp` looks like:
```
  t0: ch,glue = EntryToken
  t37: v8i64,ch = load<(load (s512) from `ptr null`, align 8)> t0, Constant:i64<0>, undef:i64
  t13: v8f64 = BUILD_VECTOR ConstantFP:f64<0.000000e+00>, ConstantFP:f64<0.000000e+00>, ConstantFP:f64<0.000000e+00>,
ConstantFP:f64<0.000000e+00>, ConstantFP:f64<0.000000e+00>, ConstantFP:f64<0.000000e+00>, ConstantFP:f64<0.000000e+00>
, ConstantFP:f64<0.000000e+00>
          t43: i64 = X86ISD::Wrapper TargetConstantPool:i64<double -0.000000e+00> 0
        t41: v8f64,ch = X86ISD::VBROADCAST_LOAD<(load (s64) from constant-pool)> t0, t43
      t34: v8i64 = bitcast t41
    t35: v8i64 = xor t37, t34
  t36: v8f64 = bitcast t35
  t44: v8f64 = bitcast t37
                t2: v8i16,ch = CopyFromReg t0, Register:v8i16 %0
              t21: v8i64 = sign_extend t2
            t29: v8i64 = X86ISD::VSHLI t21, TargetConstant:i8<63>
            t25: v8i64 = BUILD_VECTOR Constant:i64<0>, Constant:i64<0>, Constant:i64<0>, Constant:i64<0>, Constant:i64
<0>, Constant:i64<0>, Constant:i64<0>, Constant:i64<0>
          t27: v8i1 = setcc t29, t25, setne:ch
        t14: v8f64 = vselect t27, t36, t13
      t38: v8f64 = fma t36, t13, t14
      t6: i64,ch = CopyFromReg t0, Register:i64 %2
    t18: ch = store<(store (s512) into %ir.pp2, align 8)> t37:1, t38, t6, undef:i64
  t20: ch = X86ISD::RET_GLUE t18, TargetConstant:i32<0>
```
Call to `invertIfNegative(A)` on line 55675 creates the node `t44: v8f64 = bitcast t37`. When negating the `select` via call to `invertIfNegative(C)`, in `TargetLowering::getNegatedExpression` under `case ISD::VSELECT` (TargetLowering.cpp: 7550), `RemoveDeadNode(NegLHS)` sees no use for this node and ends up deleting it.

For the  `@llvm.fma.v8f64` intrinsic, the DAG at the entry of `combineFMA` function in `X86IselLowering.cpp` looks like:
```
  t0: ch,glue = EntryToken
    t4: i64,ch = CopyFromReg t0, Register:i64 %1
  t10: v8f64,ch = load<(load (s512) from %ir.pp1, align 8)> t0, t4, undef:i64
  t11: v8f64 = fneg t10
  t13: v8f64 = BUILD_VECTOR ConstantFP:f64<0.000000e+00>, ConstantFP:f64<0.000000e+00>, ConstantFP:f64<0.000000e+00>,
ConstantFP:f64<0.000000e+00>, ConstantFP:f64<0.000000e+00>, ConstantFP:f64<0.000000e+00>, ConstantFP:f64<0.000000e+00>
, ConstantFP:f64<0.000000e+00>
            t2: v8i16,ch = CopyFromReg t0, Register:v8i16 %0
          t7: v8i1 = truncate t2
        t14: v8f64 = vselect t7, t11, t13
      t15: v8f64 = fma t11, t13, t14
      t6: i64,ch = CopyFromReg t0, Register:i64 %2
    t16: ch = store<(store (s512) into %ir.pp2, align 8)> t10:1, t15, t6, undef:i64
  t18: ch = X86ISD::RET_GLUE t16, TargetConstant:i32<0>
```
In this case, call to `invertIfNegative(A)` simply reuses the node `t10: v8f64,ch = load<(load (s512) from %ir.pp1, align 8)> t0, t4, undef:i64`, instead of a bitcast for xor. Since this node is in use, the negation for select does not delete it, hence no assertion.

I can't find a simple way to reproduce it using the generic fma intrinsic. Do you have any suggestions?

https://github.com/llvm/llvm-project/pull/130176


More information about the llvm-commits mailing list