[llvm] [X86][ISel][FMA] Get a handle on operand nodes when negating FMA (PR #130176)
Vineet Kumar via llvm-commits
llvm-commits at lists.llvm.org
Wed Mar 12 14:29:47 PDT 2025
================
@@ -0,0 +1,25 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
+; RUN: llc < %s -mtriple=x86_64-unknown-linux-gnu -mattr=+avx512f -mattr=+fma | FileCheck %s
+
+define void @fma_neg(<8 x i1> %r280, ptr %pp1, ptr %pp2) {
+; CHECK-LABEL: fma_neg:
+; CHECK: # %bb.0:
+; CHECK-NEXT: vpmovsxwq %xmm0, %zmm0
+; CHECK-NEXT: vpsllq $63, %zmm0, %zmm0
+; CHECK-NEXT: vptestmq %zmm0, %zmm0, %k1
+; CHECK-NEXT: vmovdqu64 (%rdi), %zmm0
+; CHECK-NEXT: vpxorq {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to8}, %zmm0, %zmm1 {%k1} {z}
+; CHECK-NEXT: vxorpd %xmm2, %xmm2, %xmm2
+; CHECK-NEXT: vfnmadd213pd {{.*#+}} zmm2 = -(zmm0 * zmm2) + zmm1
+; CHECK-NEXT: vmovupd %zmm2, (%rsi)
+; CHECK-NEXT: vzeroupper
+; CHECK-NEXT: retq
+ %r290 = load <8 x double>, ptr %pp1, align 8
+ %r307 = fneg <8 x double> %r290
+ %r309 = select <8 x i1> %r280, <8 x double> %r307, <8 x double> zeroinitializer
+ %r311 = tail call <8 x double> @llvm.x86.avx512.vfmadd.pd.512(<8 x double> %r307, <8 x double> zeroinitializer, <8 x double> %r309, i32 4)
----------------
vntkmr wrote:
Using generic `@llvm.fma.v8f64` intrinsic, I am unable to reproduce the problem.
With `@llvm.x86.avx512.vfmadd.pd.512`, the DAG at the entry of `combineFMA` function in `X86IselLowering.cpp` looks like:
```
t0: ch,glue = EntryToken
t37: v8i64,ch = load<(load (s512) from `ptr null`, align 8)> t0, Constant:i64<0>, undef:i64
t13: v8f64 = BUILD_VECTOR ConstantFP:f64<0.000000e+00>, ConstantFP:f64<0.000000e+00>, ConstantFP:f64<0.000000e+00>,
ConstantFP:f64<0.000000e+00>, ConstantFP:f64<0.000000e+00>, ConstantFP:f64<0.000000e+00>, ConstantFP:f64<0.000000e+00>
, ConstantFP:f64<0.000000e+00>
t43: i64 = X86ISD::Wrapper TargetConstantPool:i64<double -0.000000e+00> 0
t41: v8f64,ch = X86ISD::VBROADCAST_LOAD<(load (s64) from constant-pool)> t0, t43
t34: v8i64 = bitcast t41
t35: v8i64 = xor t37, t34
t36: v8f64 = bitcast t35
t44: v8f64 = bitcast t37
t2: v8i16,ch = CopyFromReg t0, Register:v8i16 %0
t21: v8i64 = sign_extend t2
t29: v8i64 = X86ISD::VSHLI t21, TargetConstant:i8<63>
t25: v8i64 = BUILD_VECTOR Constant:i64<0>, Constant:i64<0>, Constant:i64<0>, Constant:i64<0>, Constant:i64
<0>, Constant:i64<0>, Constant:i64<0>, Constant:i64<0>
t27: v8i1 = setcc t29, t25, setne:ch
t14: v8f64 = vselect t27, t36, t13
t38: v8f64 = fma t36, t13, t14
t6: i64,ch = CopyFromReg t0, Register:i64 %2
t18: ch = store<(store (s512) into %ir.pp2, align 8)> t37:1, t38, t6, undef:i64
t20: ch = X86ISD::RET_GLUE t18, TargetConstant:i32<0>
```
Call to `invertIfNegative(A)` on line 55675 creates the node `t44: v8f64 = bitcast t37`. When negating the `select` via call to `invertIfNegative(C)`, in `TargetLowering::getNegatedExpression` under `case ISD::VSELECT` (TargetLowering.cpp: 7550), `RemoveDeadNode(NegLHS)` sees no use for this node and ends up deleting it.
For the `@llvm.fma.v8f64` intrinsic, the DAG at the entry of `combineFMA` function in `X86IselLowering.cpp` looks like:
```
t0: ch,glue = EntryToken
t4: i64,ch = CopyFromReg t0, Register:i64 %1
t10: v8f64,ch = load<(load (s512) from %ir.pp1, align 8)> t0, t4, undef:i64
t11: v8f64 = fneg t10
t13: v8f64 = BUILD_VECTOR ConstantFP:f64<0.000000e+00>, ConstantFP:f64<0.000000e+00>, ConstantFP:f64<0.000000e+00>,
ConstantFP:f64<0.000000e+00>, ConstantFP:f64<0.000000e+00>, ConstantFP:f64<0.000000e+00>, ConstantFP:f64<0.000000e+00>
, ConstantFP:f64<0.000000e+00>
t2: v8i16,ch = CopyFromReg t0, Register:v8i16 %0
t7: v8i1 = truncate t2
t14: v8f64 = vselect t7, t11, t13
t15: v8f64 = fma t11, t13, t14
t6: i64,ch = CopyFromReg t0, Register:i64 %2
t16: ch = store<(store (s512) into %ir.pp2, align 8)> t10:1, t15, t6, undef:i64
t18: ch = X86ISD::RET_GLUE t16, TargetConstant:i32<0>
```
In this case, call to `invertIfNegative(A)` simply reuses the node `t10: v8f64,ch = load<(load (s512) from %ir.pp1, align 8)> t0, t4, undef:i64`, instead of a bitcast for xor. Since this node is in use, the negation for select does not delete it, hence no assertion.
I can't find a simple way to reproduce it using the generic fma intrinsic. Do you have any suggestions?
https://github.com/llvm/llvm-project/pull/130176
More information about the llvm-commits
mailing list