[PATCH] D15340: Separate CSRs into a subset handled via explicit copies and a subset handled by prologue/epilogue

Manman Ren via llvm-commits llvm-commits at lists.llvm.org
Wed Dec 9 09:35:11 PST 2015


manmanren added a comment.

In http://reviews.llvm.org/D15340#305838, @hfinkel wrote:

> I'd like to make sure that I understand the situation:
>
> You have functions that you'd like to shrink wrap, but you can't because you have calls to tlv_get_addr that post-dominate the entry, and tlv_get_addr clobbers the LR and FP registers. Normally, this requires LR and FP to be saved to the stack in the prologue, and here, you still do. And you still can't shrink wrap. But, instead of saving the other CSRs to the stack, you copy them to other vregs, the thought being that you'll spill them as needed anyway, and you might as well let the register allocator make that decision than blindly spilling them all up front.


Yes, exactly!

> Saving CSRs by copying normally does not make sense if you'll immediately spill them, but does make sense if you have high-register pressure only in some cold path. I don't understand what this has to do with special calling conventions for TLS, but it seems like a generally-good thing to do any time you have high register pressure in some (cold) part(s) of a function and low register pressure elsewhere.


Yes, this can be used in a general way, but the patches currently enable it only for CXX_FAST_TLS. The performance of CXX_FAST_TLS will be worse than a normal calling convention without these patches, since we will preserve all the CSRs on the hot path.

Do functions with this CXX_FAST_TLS calling convention always exhibit that pattern?
Yes, as implemented in clang front end (specified by Itanium ABI, we will have a one-time initialization block that can be slow).

Should we endeavor to detect it more generally?
My current plan is to enable it only for CXX_FAST_TLS, when people see other potential performance improvement, it can be made more general with a motivating testing case.

Thanks for reviewing!
Manman


================
Comment at: lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp:472
@@ +471,3 @@
+  // and the function has a single exit.
+  FuncInfo->SplitCSR = Fn.getCallingConv() == CallingConv::CXX_FAST_TLS &&
+                       TLI->supportSplitCSR();
----------------
hfinkel wrote:
> Restricting this to CallingConv::CXX_FAST_TLS does not belong in target-independent code. Please move this into the TLI->supportSplitCSR() implementation.
Will do

================
Comment at: lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp:480
@@ +479,3 @@
+      bool IsReturn = !BB.empty() && isa<ReturnInst>(&BB.back());
+      if (succ_empty(&BB) || IsReturn) {
+        // If we already saw an exit, bail out. If this is an exit, but
----------------
hfinkel wrote:
> Am I correct in assuming that the `succ_empty(&BB)` check is for blocks that end in, at the IR level, an unreachable? If so, I'm not sure this logic is right, because we should be able to have as many unreachables as we'd like, so long as we have only one actual return.
Am I correct in assuming that the succ_empty(&BB) check is for blocks that end in, at the IR level, an unreachable?
Blocks ending with unreachable will have succ_empty(&BB) being true.

This currently is over-conservative, because the current implementation assumes a single return and the explicit copies will be inserted in the return block. I think we should be able to allow multiple unreachable, without explicit copies for the unreachable blocks.

================
Comment at: lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp:493
@@ +492,3 @@
+
+    if (!SingleReturn)
+      FuncInfo->SplitCSR = false;
----------------
hfinkel wrote:
> Fundamentally, why are we checking for a single return? Wouldn't you just insert the vreg->csr copies into each return block independently should there by more than one?
Yes, fundamentally, we can allow multiple returns, as long as we put explicit copies in each return block. I am trying to simplify the logic here, since in CXX_FAST_TLS case, we have a single return. I am okay with making it more general.


http://reviews.llvm.org/D15340





More information about the llvm-commits mailing list