[PATCH] D99662: [AArch64] Add Machine InstCombiner patterns for FMUL indexed variant

Fri Apr 9 10:53:37 PDT 2021

SjoerdMeijer accepted this revision.
SjoerdMeijer added a comment.
This revision is now accepted and ready to land.

Thanks, LGTM

================
Comment at: llvm/test/CodeGen/AArch64/arm64-fma-combines.ll:164
+  %add = fadd fast <2 x double> %mul, %ad
+  store <2 x double> %add, <2 x double>* %ret, align 16
+  br label %for.cond
----------------
asavonic wrote:
> SjoerdMeijer wrote:
> > Nit: can we just do a `ret %add` here?
> This an attempt to keep `shuffle` and `mul` in different basic blocks. If we return in the second basic block, the optimizer merges them into one before we reach instruction selection.
> In the latest revision of the patch I changed the tests to ensure that this never happens.
> 
Ok, thanks.

================
Comment at: llvm/test/CodeGen/AArch64/arm64-fma-combines.ll:1
-; RUN: llc < %s -O=3 -mtriple=arm64-apple-ios -mcpu=cyclone -enable-unsafe-fp-math | FileCheck %s
+; RUN: llc < %s -O=3 -mtriple=arm64-apple-ios -mcpu=cyclone -mattr=+fullfp16 -enable-unsafe-fp-math | FileCheck %s
 define void @foo_2d(double* %src) {
----------------
I don't know if the cyclone CPU supports FP16, I guess not, but just for testing purposes having it here I think is fine.

================
Comment at: llvm/test/CodeGen/AArch64/machine-combiner-fmul-dup.mir:215
+...
+# CHECK-LABEL: name: indexed_4s
+# CHECK:        [[OP1COPY:%.*]]:fpr128 = COPY $q1
----------------
For consistency probably best to add an indexed_8h test here too.If you're confident doing this, you can just do this without another review and commit it, but otherwise happy to look again. Similarly, you will probably need -mattr=+fullfp16 on the RUN line.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D99662/new/

https://reviews.llvm.org/D99662