[PATCH] D111638: [AArch64][SVE] Combine predicated FMUL/FADD into FMA
Matt Devereau via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Fri Oct 22 07:05:30 PDT 2021
MattDevereau added inline comments.
================
Comment at: llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp:719-723
+ llvm::FastMathFlags FMulFlags = cast<IntrinsicInst>(FMul)->getFastMathFlags();
+ if (!FAddFlags.allowContract() || !FMulFlags.allowContract())
+ return None;
+ if (FAddFlags != FMulFlags)
+ return None;
----------------
bsmith wrote:
> None of this seems to take into account the global fast-math options, i.e. the "unsafe-fp-math"="true" attribute, hence I don't think this optimization can ever be triggered from C, only directly written IR with the fast flags.
Compiling foo.c
```
svfloat16_t fmla_example(svbool_t p, svfloat16_t a, svfloat16_t b, svfloat16_t c) {
return svadd_f16_m(p, a, svmul_f16_m(p, b, c));
}
```
with
```
clang foo.c -S -march=armv8-a+sve -emit-llvm -o - -Ofast
```
emits
```
; Function Attrs: mustprogress nofree nosync nounwind readnone uwtable willreturn vscale_range(0,16)
define dso_local <vscale x 8 x half> @fmla_example(<vscale x 16 x i1> %p, <vscale x 8 x half> %a, <vscale x 8 x half> %b, <vscale x 8 x half> %c) local_unnamed_addr #0 {
entry:
%0 = tail call <vscale x 8 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv8i1(<vscale x 16 x i1> %p)
%1 = tail call fast <vscale x 8 x half> @llvm.aarch64.sve.fmla.nxv8f16(<vscale x 8 x i1> %0, <vscale x 8 x half> %a, <vscale x 8 x half> %b, <vscale x 8 x half> %c)
ret <vscale x 8 x half> %1
}
```
for me after implementing this patch
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D111638/new/
https://reviews.llvm.org/D111638
More information about the llvm-commits
mailing list