[PATCH] D18619: [AArch64] Combine callee-save and local stack SP adjustment instructions.

Wed May 4 11:24:34 PDT 2016

t.p.northover added inline comments.

================
Comment at: lib/Target/AArch64/AArch64FrameLowering.cpp:336
@@ +335,3 @@
+  // For pre/post inc/dec, convert to non inc/dec and update the stack offset.
+  case AArch64::STPXpre:
+    NewOpc = AArch64::STPXi;
----------------
gberry wrote:
> Yes, that's true with the following caveat.  We would be relying on AArch64LoadStoreOptimizer to combine the first/last store/load and sp decrement/increment in the case where we decide not to combine the SP bumps (e.g. if the local stack size is 0).  I think that is generally okay, though when I have tried simplifying the code in this way in the past I did run into one phase ordering issue.  If I recall correctly, occasionally tail duplication of the return block would no longer get done because it appears to the tail merger that the ret block had 3 instructions (add, ldp, ret) instead of 2 (ldp post inc, ret), which caused it to make a different decision.  This was due to the fact that tail duplication was being done before AArch64LoadStoreOptimizer.  Do you think this simplifcation of the code is worth it in light of this extra complication?  I will also say I didn't see any performance degradations because of the tail duplication difference, I just noticed it looking over code diffs.
I'm not opposed to using the load/store optimizer for that if it works (and not particularly worried about the tail-duplication change -- if it's ever important we can tweak its heuristic).

But unless I'm missing something performing that transformation here would still be simpler than the proposed patch. We'd just have a `makeMemOpPrePost(MBBI, TotalOffset)` call if emitPrologue/emitEpilogue decide it would be useful, wouldn't we?

================
Comment at: lib/Target/AArch64/AArch64FrameLowering.cpp:412
@@ +411,3 @@
+  }
+  assert(LocalStackSize % 8 == 0);
+  // All generated opcodes have scaled offsets.
----------------
gberry wrote:
> I agree, though I'm not sure this is the place to check for that.  I mainly added this assert because of the (LocalStackSize / 8) on the next line.
> I can add asserts that the SP bumps are always 16-byte aligned when they are generated.  Does that seem reasonable to you?
Yep, that sounds fine.

http://reviews.llvm.org/D18619