[PATCH] D18619: [AArch64] Combine callee-save and local stack SP adjustment instructions.

Wed May 4 11:15:45 PDT 2016

gberry added inline comments.

================
Comment at: lib/Target/AArch64/AArch64FrameLowering.cpp:336
@@ +335,3 @@
+  // For pre/post inc/dec, convert to non inc/dec and update the stack offset.
+  case AArch64::STPXpre:
+    NewOpc = AArch64::STPXi;
----------------
Yes, that's true with the following caveat.  We would be relying on AArch64LoadStoreOptimizer to combine the first/last store/load and sp decrement/increment in the case where we decide not to combine the SP bumps (e.g. if the local stack size is 0).  I think that is generally okay, though when I have tried simplifying the code in this way in the past I did run into one phase ordering issue.  If I recall correctly, occasionally tail duplication of the return block would no longer get done because it appears to the tail merger that the ret block had 3 instructions (add, ldp, ret) instead of 2 (ldp post inc, ret), which caused it to make a different decision.  This was due to the fact that tail duplication was being done before AArch64LoadStoreOptimizer.  Do you think this simplifcation of the code is worth it in light of this extra complication?  I will also say I didn't see any performance degradations because of the tail duplication difference, I just noticed it looking over code diffs.

================
Comment at: lib/Target/AArch64/AArch64FrameLowering.cpp:412
@@ +411,3 @@
+  }
+  assert(LocalStackSize % 8 == 0);
+  // All generated opcodes have scaled offsets.
----------------
I agree, though I'm not sure this is the place to check for that.  I mainly added this assert because of the (LocalStackSize / 8) on the next line.
I can add asserts that the SP bumps are always 16-byte aligned when they are generated.  Does that seem reasonable to you?

http://reviews.llvm.org/D18619