[PATCH] D39976: [AArch64] Consider the cost model when folding loads and stores

Thu Jan 11 13:15:51 PST 2018

evandro added inline comments.

================
Comment at: llvm/lib/Target/AArch64/AArch64LoadStoreOptimizer.cpp:714
+
+  // The replacement instr is profitable if it as fast as the other instrs,
+  /// but does not add more complexity.
----------------
gberry wrote:
> I would suggest that unless we know of a case where this makes a difference, make the simpler change of just doing
> ```
> return UopDif <= 0
> ```
> at this point, assuming that has the desired effect for your target.  This assumes that with latency and uops being equal, a reduction in instructions could still be beneficial.
This would lead to the test below, `stream-neon.ll`, to fail for the Exynos targets.  The reason is that, for instance, though the latency of loading a pair of quad registers with the post index addressing mode has the same latency as when using the register offset addressing mode, it is a two uop instruction, which severely limits the decode bandwidth by inserting a bubble.  This is a fairly common issue in other targets too, which this heuristic tries to capture, assuming that it's better to keep the decode bandwidth for a couple single uop instructions than incurring a decode bubble for a multiple uop instruction.  Except, of course, when optimizing for size.

================
Comment at: llvm/test/CodeGen/AArch64/ldst-opt.ll:1034
 ; CHECK: stp w{{[0-9]+}}, w{{[0-9]+}}, [sp], #16
-; CHECK: ret
   %src = alloca { i32, i32 }, align 8
----------------
gberry wrote:
> Are these changes intentional?
Yes, though not necessary.  Checking for the return is not germane to the feature being tested.

================
Comment at: llvm/test/CodeGen/AArch64/ldst-opt.ll:1343
 ; CHECK-LABEL: merge_zr32:
-; CHECK: // %entry
-; NOSTRICTALIGN-NEXT: str xzr, [x{{[0-9]+}}]
----------------
gberry wrote:
> Are all of these changes needed?
Ditto.

================
Comment at: llvm/test/CodeGen/AArch64/machine-outliner-remarks.ll:98

-attributes #0 = { noredzone nounwind ssp uwtable "no-frame-pointer-elim"="false" "target-cpu"="cyclone" }

----------------
gberry wrote:
> Are these changes needed?
Yes, since the changes in this patch affect the resulting code for this test, unless it's optimized for size.

================
Comment at: llvm/test/CodeGen/AArch64/stream-neon.ll:1
+; RUN: llc < %s -mtriple=aarch64-linux-gnu -mcpu=generic    -o - | FileCheck %s --check-prefixes=CHECK,GENERIC
+; RUN: llc < %s -mtriple=aarch64-linux-gnu -mcpu=cortex-a57 -o - | FileCheck %s --check-prefixes=CHECK,CORTEX
----------------
gberry wrote:
> This seems like it is too large to be a lit test.  Is it really testing anything that hasn't been tested already?
Probably not.  Perhaps the same could be accomplished if `ldst-opt.ll` were changed to test for the same targets as this test, when the result would be same.

Repository:
  rL LLVM

https://reviews.llvm.org/D39976