[PATCH] D111638: [AArch64][SVE] Combine predicated FMUL/FADD into FMA

Tue Nov 2 06:54:43 PDT 2021

peterwaller-arm added inline comments.

================
Comment at: llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp:702-713
+  auto m_SVEFAdd = [](auto p, auto w, auto x) {
+    return m_CombineOr(m_Intrinsic<Intrinsic::aarch64_sve_fadd>(p, w, x),
+                       m_Intrinsic<Intrinsic::aarch64_sve_fadd>(p, x, w));
+  };
+  auto m_SVEFMul = [](auto p, auto y, auto z) {
+    return m_Intrinsic<Intrinsic::aarch64_sve_fmul>(p, y, z);
+  };
----------------
MattDevereau wrote:
> paulwalker-arm wrote:
> > Is this correct?  It looks like you're allowing the `fmul` to be either operand of the `fadd`, effectively saying `fadd(p, a, b) == fadd(p, b, a)`.  Although true for the active lanes, the inactive lanes take the value of the first data operand and so these two forms are not identical.
> That sounds right but it seems like something that would probably be caught by a test somewhere. I've had a look at FMLA [[ https://developer.arm.com/documentation/ddi0602/2021-09/SVE-Instructions/FMLA--vectors---Floating-point-fused-multiply-add-vectors--predicated---writing-addend--Zda---Zda---Zn---Zm--?lang=en | here ]] and it says "Inactive elements in the destination vector register remain unmodified.". FADD on the otherhand places the value of the first data operand in the destination vector, so FMLA and FADD seem to differ here. I'm probably wrong but from what I can see, the addition part of the FMLA can assume `fadd(p, a, b) == fadd(p, b, a)` but a normal FADD instruction cannot? Either way it seems like a more extensive set of rules should be considered for this
> That sounds right but it seems like something that would probably be caught by a test somewhere. 

Not sure what you mean there, can you be more specific? The <it> is ambiguous: what would be caught? And by which testing?

> I've had a look at FMLA [...]

> FADD on the otherhand places the value of the first data operand in the destination vector, so FMLA and FADD seem to differ here. I'm probably wrong but from what I can see, the addition part of the FMLA can assume `fadd(p, a, b) == fadd(p, b, a)` but a normal FADD instruction cannot? Either way it seems like a more extensive set of rules should be considered for this

I agree with what Paul's saying, consider: `X = A + (B * C)` and `Y = (B * C) + A`.

In the case of inactive lanes before the combine, `X` ends up with values from `A` and `Y` ends up with values from `(B * C)`.

But the combine as is rewrites both cases FMLA(A, B, C). `FMLA(A, B, C)` takes lanes from A in case of an inactive predicate.

Therefore the output does not have lanes from `(B * C)` for the expression `Y`, as it should.

So this needs updating, and we can ignore the Reassoc flag.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D111638/new/

https://reviews.llvm.org/D111638