[llvm-dev] Less aggressive on the first allocation of CSR if detecting an early exit

Fri Nov 17 20:50:02 PST 2017

> On Nov 17, 2017, at 11:15 AM, junbuml at codeaurora.org wrote:
> 
> On 2017-11-17 13:10, Quentin Colombet wrote:
>>> On Nov 16, 2017, at 2:31 PM, junbuml at codeaurora.org wrote:
>>> On 2017-11-14 17:22, Quentin Colombet wrote:
>>>> Hi,
>>>> I think it is kind of artificial to tie the CSRCost with the
>>>> presence
>>>> of calls.
>>>> I think I’ve already mentioned it in one of the review, but I
>>>> believe it would be better to differentiate when we want to use a
>>>> CSR
>>>> to avoid spilling or to avoid splitting. CSR instead of spilling
>>>> is
>>>> good, CSR instead of splitting, not so good :).
>>> About this, I can see your previous comment in D27366 (I copied it
>>> below) :
>>> Also, that's possible that the right fix/simple fix is to have one
>>> CSRCost for split and one for spill.
>>> Indeed, IIRC, right now the returned cost for both spilling and
>>> splitting is going to be the sum of the frequencies where the
>>> split/spill happen and given the spill and copy have different cost,
>>> we may want to have different comparison.
>>> E.g., CSRCostForSpill = 5 (ok to trade against more than 5 executed
>>> spill/reload) but CSRCostForSpilt = 20 (ok to trade against more
>>> than 20 executed copies)
>>> If this is what you meant here, is the CSRCostForSpilt the actual
>>> cost directly comparable with the split cost?
>>> Or, it should be multiplied with the entry frequency to be
>>> comparable with the split cost, considering that the CSRCost is the
>>> cost of spilling CSR in the entry?
>> I believe it should be compared directly with the split cost. I.e.,
>> CSRCostXXX against CostOfTheOtherChoiceWithFrequency.
> 
> As far as I know the spill/split cost which is compared with CSRCost is the sum of the frequencies of spots where spill/split happen, and the CSRCost is the cost of spilling CSR at the entry. If we say, for example, 20 copies from pre-split are okay to trade against 1 csr spill at the entry, then shouldn't we multiply FreqOfEntry with the 20 (number of tradable copies)?

Yes, you’re right with FreqOfEntry. I assumed this was 1.

> Please me know if I miss something here?
> 
> 
>>> By doing this, I would expect we mechanically get the desired
>>> behavior
>>> that CSRs get used for live-ranges that go through calls (otherwise
>>> we
>>> would have spilled).
>>> My 2c.
>>> Cheers,
>>> -Quentin
>>> On Nov 10, 2017, at 12:34 PM, junbuml at codeaurora.org wrote:
>>> On 2017-11-10 07:47, Nemanja Ivanovic wrote:
>>> One thing I thought about doing a while back and never really
>>> wrote a
>>> POC for is the following:
>>> - Make FirstCSRCost a property of the MachineBasicBlock (or create
>>> a
>>> map of MBB* -> FirstCSRCost)
>>> - Implement a pre-RA pass that will populate the map as follows:
>>> - Identify all blocks with calls
>>> - Find the nearest common dominator (NCD) to all those blocks
>>> (not
>>> strict so that a block with a call might be that NCD)
>>> - If the NCD is the entry block, CSR allocation is cheap in all
>>> blocks
>>> - Make CSR allocation cheap in blocks that are in the dominator
>>> tree
>>> rooted at NCD
>>> The idea would be to favour CSR allocation in blocks that might be
>>> eligible for the prologue and favour splitting in blocks that we'd
>>> prefer not to have a prologue in (or before).
>>> Then a CFG such as this:
>>> A
>>> / \
>>> B   C
>>> |  / \
>>> | D   E
>>> | |  /
>>> | | /
>>> | |/
>>> | /
>>> |/
>>> F
>>> - Assume calls are in B and ANY_OF(C,D,E): CSR allocation is cheap
>>> everywhere
>>> - Assume calls are in C or ALL_OF(D,E): CSR allocation is cheap in
>>> ALL_OF(C,D,E); CSR allocation is expensive in ALL_OF(A,B,F)
>>> - Assume only call is in ANY_OF(B,D,E): CSR allocation is cheap
>>> only
>>> in that block, expensive everywhere else
>>> I think this construction would give us what we want, but there
>>> may be
>>> [obvious] counter-examples I haven't thought of.
>>> Thanks Nemanja for sharing your idea. I think this might cover the
>>> case I was targeting, and we may not need to be limited only in the
>>> entry block.
>>> However, I bit worry if this cause RegAllc to slow in allocating
>>> CSRs.  What if we have a call-site in the early exit path and no
>>> call-site in all other blocks. Then, conservative allocation of CSRs
>>> might not be a good choice if the reg pressure is high in the all
>>> other blocks.  As our first step, would it make sense to limit this
>>> only when we detect an early exit? I guess Quentin may have some
>>> comment.
>>> thanks,
>>> Jun
>>> On Tue, Oct 31, 2017 at 8:38 PM, via llvm-dev
>>> <llvm-dev at lists.llvm.org> wrote:
>>> On 2017-10-30 21:20, Hal Finkel wrote:
>>> On 10/30/2017 12:20 PM, junbuml at codeaurora.org wrote:
>>> On 2017-10-27 19:50, Hal Finkel wrote:
>>> On 10/27/2017 03:32 PM, Jun Lim via llvm-dev wrote:
>>> When compiling C code below for AArach64, I saw that
>>> shrink-wrapping
>>> didn't happen due to the very early uses of CSRs in the entry block.
>>> So CSR spills/reloads are executed even when the early exit block is
>>> taken.
>>> int getI(int i);
>>> int foo(int *P, int i) {
>>> if (i>0)
>>> return P[i];
>>> i = getI(i);
>>> return P[i];
>>> }
>>> It's not that hard to find such cases where RegAllocGreedy
>>> aggressively allocates a CSRs when a live range expands across a
>>> call-site.  That's because of the conservatively initialized
>>> CSRCost, causing RegAllocGreedy to strongly favour allocating a CSR
>>> over splitting a region. Since allocation of CSRs requires the cost
>>> of spilling CSRs, allocating CSRs is not always beneficial. Like the
>>> case above, if a function has an early exit code, we may want to be
>>> less aggressive on the first allocation of CSR in the entry block by
>>> increasing the CSRCost.
>>> Previously, I proposed https://reviews.llvm.org/D34608 [1] in this
>>> matter, but the way I detect the profitable cases and the way I
>>> increase the CRSCost was somewhat unclear. Now, I'm thinking to less
>>> aggressive on the first allocation of CSR in the entry block in case
>>> where the function has an early exit so that encourage more
>>> shrink-wrapping and avoid executing CSR spill/recover when the early
>>> exit is taken. By sending this out, I just want to get any high
>>> level feedback early. Please let me know if anyone has any opinion
>>> about this.
>>> So the heuristic will have nothing to do with the presence of calls?
>>> Might this increase spilling in loops?
>>> -Hal
>>> Before allowing the first allocation from CSRs, I will check if the
>>> virtual register is really live across a call in other blocks. If
>>> the
>>> function have a call in entry or exit, we don't need to increase the
>>> CSRCost. This heuristic will be applied only for the very first
>>> allocation from CSRs only in the entry block when the function has
>>> an
>>> early exit; if a CSR is already allocated in the function, we will
>>> use
>>> the current default global CSRCost.
>>> Even after increasing the CSRCost, we should end up allocating the
>>> first CSR if the cost of splitting the live-range is still higher
>>> than
>>> the increased CSRCost. I believe the amount we want to increase the
>>> CSRCost must be target-dependent, but it must be conservative enough
>>> to avoid too many copies in the related spots.
>>> Thanks for explaining.
>>> I suppose you'll want to make sure that the call(s) in question come
>>> after the early exit (i.e., that there aren't calls before the early
>>> exit)?
>>> I will check if a virtual register is live across only calls which
>>> will be executed when the early exit is not taken. By increasing
>>> CSRcost for such case, we increase chances to avoid executing CSR
>>> spill/recover when the early exit is taken.
>>> When you say "only in the entry block" you mean that the live range
>>> starts in the entry block, right (i.e., that it is a function
>>> parameter or a vreg defined by some instruction in the entry block)?
>>> Yes.
>>> Does it matter that it is in the entry block, or do you only need it
>>> to come before the early exit and have an execution frequency <= to
>>> the entry block's execution frequency?
>>> The reason I specifically check the entry block is because for now I
>>> see the early exit happen only in entry block; limiting that the
>>> early
>>> exit condition is checked in the entry block and branch to the exit
>>> block directly from the entry block if hitting the condition. If
>>> overall approach is reasonable, we can certainly extend it.
>>> -Hal
>>> Thanks,
>>> Jun
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> llvm-dev at lists.llvm.org
>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev [2]
>>> -- Hal Finkel
>>> Lead, Compiler Technology and Programming Languages
>>> Leadership Computing Facility
>>> Argonne National Laboratory
>>> --
>>> Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm
>>> Technologies, Inc.
>>> Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a
>>> Linux Foundation Collaborative Project.
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> llvm-dev at lists.llvm.org
>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev [2]
>>> Links:
>>> ------
>>> [1] https://reviews.llvm.org/D34608
>>> [2] http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>> --
>> Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm
>> Technologies, Inc.
>> Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a
>> Linux Foundation Collaborative Project.
>> --
>> Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm
>> Technologies, Inc.
>> Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a
>> Linux Foundation Collaborative Project.
> 
> -- 
> Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
> Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20171117/f3f8ef25/attachment.html>