[PATCH] D111638: [AArch64][SVE] Combine predicated FMUL/FADD into FMA

Fri Oct 22 07:05:30 PDT 2021

MattDevereau added inline comments.

================
Comment at: llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp:719-723
+  llvm::FastMathFlags FMulFlags = cast<IntrinsicInst>(FMul)->getFastMathFlags();
+  if (!FAddFlags.allowContract() || !FMulFlags.allowContract())
+    return None;
+  if (FAddFlags != FMulFlags)
+    return None;
----------------
bsmith wrote:
> None of this seems to take into account the global fast-math options, i.e. the "unsafe-fp-math"="true" attribute, hence I don't think this optimization can ever be triggered from C, only directly written IR with the fast flags.
Compiling foo.c
```
svfloat16_t fmla_example(svbool_t p, svfloat16_t a, svfloat16_t b, svfloat16_t c) {
  return svadd_f16_m(p, a, svmul_f16_m(p, b, c));
}
```
with
```
clang foo.c -S -march=armv8-a+sve -emit-llvm -o - -Ofast
```
emits
```
; Function Attrs: mustprogress nofree nosync nounwind readnone uwtable willreturn vscale_range(0,16)
define dso_local <vscale x 8 x half> @fmla_example(<vscale x 16 x i1> %p, <vscale x 8 x half> %a, <vscale x 8 x half> %b, <vscale x 8 x half> %c) local_unnamed_addr #0 {
entry:
  %0 = tail call <vscale x 8 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv8i1(<vscale x 16 x i1> %p)
  %1 = tail call fast <vscale x 8 x half> @llvm.aarch64.sve.fmla.nxv8f16(<vscale x 8 x i1> %0, <vscale x 8 x half> %a, <vscale x 8 x half> %b, <vscale x 8 x half> %c)
  ret <vscale x 8 x half> %1
}

```
for me after implementing this patch

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D111638/new/

https://reviews.llvm.org/D111638