[PATCH] D68530: [AArch64] Make combining of callee-save and local stack adjustment optional

Nikolai Tillmann via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Fri Oct 4 17:42:13 PDT 2019


Nikolai created this revision.
Nikolai added reviewers: t.p.northover, gberry.
Herald added subscribers: llvm-commits, hiraditya, kristof.beyls.
Herald added a project: LLVM.

For arm64, https://reviews.llvm.org/D18619 introduced the ability to combine bumping the stack pointer upfront in case it needs to be bumped for both the callee-save area as well as the local stack area.

That diff already remarks that "This change can cause an increase in instructions", but argues that even when that happens, it should be still be a performance benefit because the number of micro-ops is reduced.

We have observed that this code-size increase can be significant in practice. To enable configuring this behavior, this diff introduces a new flag, `-never-combine-csr-local-stamp-bump-for-size`. It disables combining stack bumping for methods that are marked as optimize-for-size.

Example of a prologue with the default behavior (combining stack bumping when possible):

  sub        sp, sp, #0x40
  stp        d9, d8, [sp, #0x10]
  stp        x20, x19, [sp, #0x20]
  stp        x29, x30, [sp, #0x30]
  add        x29, sp, #0x30
  [... compute x8 somehow ...]
  stp        x0, x8, [sp]

And when `-never-combine-csr-local-stamp-bump-for-size` is on, if the method is marked as optimize-for-size:

  stp        d9, d8, [sp, #-0x30]!
  stp        x20, x19, [sp, #0x10]
  stp        x29, x30, [sp, #0x20]
  add        x29, sp, #0x20
  [... compute x8 somehow ...]
  stp        x0, x8, [sp, #-0x10]!

Note that without combining the stack bump there are two auto-decrements, nicely folded into the `stp` instructions, whereas otherwise there is a single `sub sp, ...` instruction, but not folded.


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D68530

Files:
  llvm/lib/Target/AArch64/AArch64FrameLowering.cpp
  llvm/test/CodeGen/AArch64/arm64-never-combine-csr-local-stack-bump.ll


Index: llvm/test/CodeGen/AArch64/arm64-never-combine-csr-local-stack-bump.ll
===================================================================
--- /dev/null
+++ llvm/test/CodeGen/AArch64/arm64-never-combine-csr-local-stack-bump.ll
@@ -0,0 +1,25 @@
+; RUN: llc < %s -mtriple=arm64-apple-ios7.0 -disable-post-ra -never-combine-csr-local-stack-bump-for-size | FileCheck %s
+
+; CHECK-LABEL: main:
+; CHECK:       stp     x29, x30, [sp, #-16]!
+; CHECK-NEXT:  stp     xzr, xzr, [sp, #-16]!
+; CHECK:       adrp    x0, l_.str at PAGE
+; CHECK:       add     x0, x0, l_.str at PAGEOFF
+; CHECK-NEXT:  bl      _puts
+; CHECK-NEXT:   add     sp, sp, #16
+; CHECK-NEXT:	ldp	x29, x30, [sp], #16
+; CHECK-NEXT:	ret
+
+ at .str = private unnamed_addr constant [7 x i8] c"hello\0A\00"
+
+define i32 @main() nounwind ssp optsize {
+entry:
+  %local1 = alloca i64, align 8
+  %local2 = alloca i64, align 8
+  store i64 0, i64* %local1
+  store i64 0, i64* %local2
+  %call = call i32 @puts(i8* getelementptr inbounds ([7 x i8], [7 x i8]* @.str, i32 0, i32 0))
+  ret i32 %call
+}
+
+declare i32 @puts(i8*)
\ No newline at end of file
Index: llvm/lib/Target/AArch64/AArch64FrameLowering.cpp
===================================================================
--- llvm/lib/Target/AArch64/AArch64FrameLowering.cpp
+++ llvm/lib/Target/AArch64/AArch64FrameLowering.cpp
@@ -170,6 +170,11 @@
                          cl::desc("reverse the CSR restore sequence"),
                          cl::init(false), cl::Hidden);
 
+static cl::opt<bool> NeverCombineCSRLocalStackBumpForSize(
+    "never-combine-csr-local-stack-bump-for-size", cl::init(false), cl::Hidden,
+    cl::desc("Never combine CSR+local stack bump when optimizing a function "
+             "for size (default = off)"));
+
 STATISTIC(NumRedZoneFunctions, "Number of functions using red zone");
 
 /// This is the biggest offset to the stack pointer we can encode in aarch64
@@ -447,6 +452,9 @@
   const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
   const AArch64RegisterInfo *RegInfo = Subtarget.getRegisterInfo();
 
+  if (NeverCombineCSRLocalStackBumpForSize && MF.getFunction().hasOptSize())
+    return false;
+
   if (AFI->getLocalStackSize() == 0)
     return false;
 


-------------- next part --------------
A non-text attachment was scrubbed...
Name: D68530.223347.patch
Type: text/x-patch
Size: 2235 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20191005/73a2c316/attachment.bin>


More information about the llvm-commits mailing list