[PATCH] D40696: Enable aggressive FMA on T99 and provide AArch64 option for other micro-arch's

Mon Jan 22 03:10:58 PST 2018

fhahn added reviewers: evandro, mcrosier.
fhahn added a comment.

Thanks for updating the patch! I have a few suggestions about reducing `fma-aggressive.ll `, but besides that and removing  `aarch64-enable-aggressive-fma` it looks quite good.

> I would like the test in test/CodeGen/AArch64/fma-aggressive.ll to be included in this changeset, simply because the run-time test from https://reviews.llvm.org/D41810 has no way of determining whether or not AggressiveFMA was enabled, or not.

It's fine to have a more complex test with more patterns than `test/CodeGen/AArch64/fma-simple.ll`, but `test/CodeGen/AArch64/fma-aggressive.ll` still contains lots of stuff unrelated the FMA fusion, e.g. `define double @reset(double %x, double %y) local_unnamed_addr #0 `is not used, the attributes and debug metadata is not needed, the global variables are not needed, the calls to fprint/frwite and so on are not needed and the loop should have not have an impact on the DAGCombiner.

I think the following test function should still expose patterns very similar to the original `fma-aggressive.ll`, but only focuses on the relevant bits for fusion:

  define double @test1(double %a, double %b, double %conv, i32 %rem) {
  entry:
    %conv.neg = fsub fast double -0.000000e+00, %conv
    %cmp3 = icmp eq i32 %rem, 0
    %. = select i1 %cmp3, double 1.000000e+00, double -1.000000e+00
    %add = fadd fast double %a, 1.000000e+00
    %add8 = fadd fast double %add, %.
    %mul = fmul fast double %add8, %a
    %add9 = fadd fast double %mul, 1.000000e+01
    %mul10 = fmul fast double %add9, 2.000000e+00
    %add11 = fsub fast double %conv.neg, %mul
    %sub = fadd fast double %add11, %mul10
    %mul14 = fmul fast double %sub, %mul
    %mul15 = fmul fast double %mul14, %.
    %sub17 = fadd fast double %mul15, %mul14
   br i1 %cmp3, label %if.then22, label %if.else27

  if.then22:
    %add23 = fadd fast double %sub17, -1.000000e+00
    %sub24 = fadd fast double %sub17, 1.000000e+00
    %mul25 = fmul fast double %add23, %sub24
    %sub26 = fsub fast double 1.000000e+00, %mul25
    br label %if.end32

  if.else27:
    %sub28 = fsub fast double -1.000000e+00, %sub17
    %sub29 = fadd fast double %sub17, 1.000000e+00
    %mul30 = fmul fast double %sub28, %sub29
    %add31 = fadd fast double %mul30, 1.000000e+00
    br label %if.end32

  if.end32:                                         ; preds = %if.else27, %if.then22
    %b.1 = phi double [ %sub26, %if.then22 ], [ %add31, %if.else27 ]
    %sub33 = fsub fast double 1.000000e+00, %sub17
    %mul34 = fmul fast double %sub17, %conv
    %mul35 = fmul fast double %mul34, %sub33
    %sub36 = fsub fast double %b.1, %.
    %add37 = fadd fast double %b.1, %mul35
    %add38 = fadd fast double %add37, %sub36
    ret double %add38
  }

If you think this is not sufficient for some reason, please at least remove the unneeded metadata, unused functions from the test case, replace the calls to fprintf & co that are needed to have uses for the FMA results with simpler functions (or return the result) and replace the globals by adding parameters if required. Also the other ifs should not matter to the combiner either, so I would simplify the CFG there as well.

> Simplified test case: Present in test/CodeGen/AArch64/fma-simple.ll

pending simplification in `fma-aggressive`, I would move this test function to `fma-aggressive.ll` too.

================
Comment at: lib/Target/AArch64/AArch64.td:152

+def FeatureAggressiveFMAFloat :
+  SubtargetFeature<"aggressive-fma-float",
----------------
How likely is it that we aggressive FMA is beneficial for floats or double only? IMO it's probably enough to just have `-aggressive-fma` for now.

================
Comment at: lib/Target/AArch64/AArch64ISelLowering.cpp:116

+static cl::opt<bool>
+EnableAggressiveFMA("aarch64-enable-aggressive-fma", cl::Hidden,
----------------
I think there is no need for this option any longer,  `-mattr=+aggressive-fma-float+aggressive-fma-double` can be used instead.

================
Comment at: lib/Target/AArch64/AArch64ISelLowering.cpp:10977
+bool AArch64TargetLowering::enableAggressiveFMAFusion(EVT VT) const {
+  if (EnableAggressiveFMA.getValue())
+    return VT.isFloatingPoint();
----------------
If we are only using subtarget features here, we can move this to the header I think.

================
Comment at: test/CodeGen/AArch64/fma-simple.ll:2
+; RUN: llc -O2 -mtriple=aarch64-none-linux-gnu -mcpu=thunderx2t99 -fp-contract=fast < %s | FileCheck %s --check-prefix=CHECK-FMA
+; RUN: llc -O2 -mtriple=aarch64-none-linux-gnu -mcpu=generic < %s | FileCheck %s --check-prefix=CHECK-GENERIC
+define double @test(double %x, double %y, double %z) {
----------------
Please also add a test that enables aggressive fma fusion using mattr for non-thunderX2.

Repository:
  rL LLVM

https://reviews.llvm.org/D40696