[llvm-dev] Less aggressive on the first allocation of CSR if detecting an early exit

Fri Nov 17 10:10:10 PST 2017

> On Nov 16, 2017, at 2:31 PM, junbuml at codeaurora.org wrote:
> 
> On 2017-11-14 17:22, Quentin Colombet wrote:
>> Hi,
>> I think it is kind of artificial to tie the CSRCost with the presence
>> of calls.
>> I think I’ve already mentioned it in one of the review, but I
>> believe it would be better to differentiate when we want to use a CSR
>> to avoid spilling or to avoid splitting. CSR instead of spilling is
>> good, CSR instead of splitting, not so good :).
> 
> About this, I can see your previous comment in D27366 (I copied it below) :
>  Also, that's possible that the right fix/simple fix is to have one CSRCost for split and one for spill.
>  Indeed, IIRC, right now the returned cost for both spilling and splitting is going to be the sum of the frequencies where the split/spill happen and given the spill and copy have different cost, we may want to have different comparison.
>  E.g., CSRCostForSpill = 5 (ok to trade against more than 5 executed spill/reload) but CSRCostForSpilt = 20 (ok to trade against more than 20 executed copies)
> 
> If this is what you meant here, is the CSRCostForSpilt the actual cost directly comparable with the split cost?
> Or, it should be multiplied with the entry frequency to be comparable with the split cost, considering that the CSRCost is the cost of spilling CSR in the entry?

I believe it should be compared directly with the split cost. I.e., CSRCostXXX against CostOfTheOtherChoiceWithFrequency.

> 
> 
>> By doing this, I would expect we mechanically get the desired behavior
>> that CSRs get used for live-ranges that go through calls (otherwise we
>> would have spilled).
>> My 2c.
>> Cheers,
>> -Quentin
>>> On Nov 10, 2017, at 12:34 PM, junbuml at codeaurora.org wrote:
>>> On 2017-11-10 07:47, Nemanja Ivanovic wrote:
>>>> One thing I thought about doing a while back and never really
>>>> wrote a
>>>> POC for is the following:
>>>> - Make FirstCSRCost a property of the MachineBasicBlock (or create
>>>> a
>>>> map of MBB* -> FirstCSRCost)
>>>> - Implement a pre-RA pass that will populate the map as follows:
>>>> - Identify all blocks with calls
>>>> - Find the nearest common dominator (NCD) to all those blocks
>>>> (not
>>>> strict so that a block with a call might be that NCD)
>>>> - If the NCD is the entry block, CSR allocation is cheap in all
>>>> blocks
>>>> - Make CSR allocation cheap in blocks that are in the dominator
>>>> tree
>>>> rooted at NCD
>>>> The idea would be to favour CSR allocation in blocks that might be
>>>> eligible for the prologue and favour splitting in blocks that we'd
>>>> prefer not to have a prologue in (or before).
>>>> Then a CFG such as this:
>>>> A
>>>> / \
>>>> B   C
>>>> |  / \
>>>> | D   E
>>>> | |  /
>>>> | | /
>>>> | |/
>>>> | /
>>>> |/
>>>> F
>>>> - Assume calls are in B and ANY_OF(C,D,E): CSR allocation is cheap
>>>> everywhere
>>>> - Assume calls are in C or ALL_OF(D,E): CSR allocation is cheap in
>>>> ALL_OF(C,D,E); CSR allocation is expensive in ALL_OF(A,B,F)
>>>> - Assume only call is in ANY_OF(B,D,E): CSR allocation is cheap
>>>> only
>>>> in that block, expensive everywhere else
>>>> I think this construction would give us what we want, but there
>>>> may be
>>>> [obvious] counter-examples I haven't thought of.
>>> Thanks Nemanja for sharing your idea. I think this might cover the
>>> case I was targeting, and we may not need to be limited only in the
>>> entry block.
>>> However, I bit worry if this cause RegAllc to slow in allocating
>>> CSRs.  What if we have a call-site in the early exit path and no
>>> call-site in all other blocks. Then, conservative allocation of CSRs
>>> might not be a good choice if the reg pressure is high in the all
>>> other blocks.  As our first step, would it make sense to limit this
>>> only when we detect an early exit? I guess Quentin may have some
>>> comment.
>>> thanks,
>>> Jun
>>> On Tue, Oct 31, 2017 at 8:38 PM, via llvm-dev
>>> <llvm-dev at lists.llvm.org> wrote:
>>> On 2017-10-30 21:20, Hal Finkel wrote:
>>> On 10/30/2017 12:20 PM, junbuml at codeaurora.org wrote:
>>> On 2017-10-27 19:50, Hal Finkel wrote:
>>> On 10/27/2017 03:32 PM, Jun Lim via llvm-dev wrote:
>>> When compiling C code below for AArach64, I saw that
>>> shrink-wrapping
>>> didn't happen due to the very early uses of CSRs in the entry block.
>>> So CSR spills/reloads are executed even when the early exit block is
>>> taken.
>>> int getI(int i);
>>> int foo(int *P, int i) {
>>> if (i>0)
>>> return P[i];
>>> i = getI(i);
>>> return P[i];
>>> }
>>> It's not that hard to find such cases where RegAllocGreedy
>>> aggressively allocates a CSRs when a live range expands across a
>>> call-site.  That's because of the conservatively initialized
>>> CSRCost, causing RegAllocGreedy to strongly favour allocating a CSR
>>> over splitting a region. Since allocation of CSRs requires the cost
>>> of spilling CSRs, allocating CSRs is not always beneficial. Like the
>>> case above, if a function has an early exit code, we may want to be
>>> less aggressive on the first allocation of CSR in the entry block by
>>> increasing the CSRCost.
>>> Previously, I proposed https://reviews.llvm.org/D34608 [1] in this
>>> matter, but the way I detect the profitable cases and the way I
>>> increase the CRSCost was somewhat unclear. Now, I'm thinking to less
>>> aggressive on the first allocation of CSR in the entry block in case
>>> where the function has an early exit so that encourage more
>>> shrink-wrapping and avoid executing CSR spill/recover when the early
>>> exit is taken. By sending this out, I just want to get any high
>>> level feedback early. Please let me know if anyone has any opinion
>>> about this.
>>> So the heuristic will have nothing to do with the presence of calls?
>>> Might this increase spilling in loops?
>>> -Hal
>>> Before allowing the first allocation from CSRs, I will check if the
>>> virtual register is really live across a call in other blocks. If
>>> the
>>> function have a call in entry or exit, we don't need to increase the
>>> CSRCost. This heuristic will be applied only for the very first
>>> allocation from CSRs only in the entry block when the function has
>>> an
>>> early exit; if a CSR is already allocated in the function, we will
>>> use
>>> the current default global CSRCost.
>>> Even after increasing the CSRCost, we should end up allocating the
>>> first CSR if the cost of splitting the live-range is still higher
>>> than
>>> the increased CSRCost. I believe the amount we want to increase the
>>> CSRCost must be target-dependent, but it must be conservative enough
>>> to avoid too many copies in the related spots.
>>> Thanks for explaining.
>>> I suppose you'll want to make sure that the call(s) in question come
>>> after the early exit (i.e., that there aren't calls before the early
>>> exit)?
>>> I will check if a virtual register is live across only calls which
>>> will be executed when the early exit is not taken. By increasing
>>> CSRcost for such case, we increase chances to avoid executing CSR
>>> spill/recover when the early exit is taken.
>>> When you say "only in the entry block" you mean that the live range
>>> starts in the entry block, right (i.e., that it is a function
>>> parameter or a vreg defined by some instruction in the entry block)?
>>> Yes.
>>> Does it matter that it is in the entry block, or do you only need it
>>> to come before the early exit and have an execution frequency <= to
>>> the entry block's execution frequency?
>>> The reason I specifically check the entry block is because for now I
>>> see the early exit happen only in entry block; limiting that the
>>> early
>>> exit condition is checked in the entry block and branch to the exit
>>> block directly from the entry block if hitting the condition. If
>>> overall approach is reasonable, we can certainly extend it.
>>> -Hal
>>> Thanks,
>>> Jun
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> llvm-dev at lists.llvm.org
>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev [2]
>>> -- Hal Finkel
>>> Lead, Compiler Technology and Programming Languages
>>> Leadership Computing Facility
>>> Argonne National Laboratory
>>> --
>>> Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm
>>> Technologies, Inc.
>>> Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a
>>> Linux Foundation Collaborative Project.
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> llvm-dev at lists.llvm.org
>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev [2]
>>> Links:
>>> ------
>>> [1] https://reviews.llvm.org/D34608
>>> [2] http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>> --
>> Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm
>> Technologies, Inc.
>> Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a
>> Linux Foundation Collaborative Project.
> 
> -- 
> Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
> Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20171117/581bddb1/attachment.html>