[llvm-dev] Less aggressive on the first allocation of CSR if detecting an early exit

Fri Nov 17 11:15:54 PST 2017

On 2017-11-17 13:10, Quentin Colombet wrote:
>> On Nov 16, 2017, at 2:31 PM, junbuml at codeaurora.org wrote:
>> On 2017-11-14 17:22, Quentin Colombet wrote:
>> 
>>> Hi,
>>> I think it is kind of artificial to tie the CSRCost with the
>>> presence
>>> of calls.
>>> I think I’ve already mentioned it in one of the review, but I
>>> believe it would be better to differentiate when we want to use a
>>> CSR
>>> to avoid spilling or to avoid splitting. CSR instead of spilling
>>> is
>>> good, CSR instead of splitting, not so good :).
>> 
>> About this, I can see your previous comment in D27366 (I copied it
>> below) :
>> Also, that's possible that the right fix/simple fix is to have one
>> CSRCost for split and one for spill.
>> Indeed, IIRC, right now the returned cost for both spilling and
>> splitting is going to be the sum of the frequencies where the
>> split/spill happen and given the spill and copy have different cost,
>> we may want to have different comparison.
>> E.g., CSRCostForSpill = 5 (ok to trade against more than 5 executed
>> spill/reload) but CSRCostForSpilt = 20 (ok to trade against more
>> than 20 executed copies)
>> 
>> If this is what you meant here, is the CSRCostForSpilt the actual
>> cost directly comparable with the split cost?
>> Or, it should be multiplied with the entry frequency to be
>> comparable with the split cost, considering that the CSRCost is the
>> cost of spilling CSR in the entry?
> 
> I believe it should be compared directly with the split cost. I.e.,
> CSRCostXXX against CostOfTheOtherChoiceWithFrequency.

As far as I know the spill/split cost which is compared with CSRCost is 
the sum of the frequencies of spots where spill/split happen, and the 
CSRCost is the cost of spilling CSR at the entry. If we say, for 
example, 20 copies from pre-split are okay to trade against 1 csr spill 
at the entry, then shouldn't we multiply FreqOfEntry with the 20 (number 
of tradable copies)? Please me know if I miss something here?

> 
>> By doing this, I would expect we mechanically get the desired
>> behavior
>> that CSRs get used for live-ranges that go through calls (otherwise
>> we
>> would have spilled).
>> My 2c.
>> Cheers,
>> -Quentin
>> On Nov 10, 2017, at 12:34 PM, junbuml at codeaurora.org wrote:
>> On 2017-11-10 07:47, Nemanja Ivanovic wrote:
>> One thing I thought about doing a while back and never really
>> wrote a
>> POC for is the following:
>> - Make FirstCSRCost a property of the MachineBasicBlock (or create
>> a
>> map of MBB* -> FirstCSRCost)
>> - Implement a pre-RA pass that will populate the map as follows:
>> - Identify all blocks with calls
>> - Find the nearest common dominator (NCD) to all those blocks
>> (not
>> strict so that a block with a call might be that NCD)
>> - If the NCD is the entry block, CSR allocation is cheap in all
>> blocks
>> - Make CSR allocation cheap in blocks that are in the dominator
>> tree
>> rooted at NCD
>> The idea would be to favour CSR allocation in blocks that might be
>> eligible for the prologue and favour splitting in blocks that we'd
>> prefer not to have a prologue in (or before).
>> Then a CFG such as this:
>> A
>> / \
>> B   C
>> |  / \
>> | D   E
>> | |  /
>> | | /
>> | |/
>> | /
>> |/
>> F
>> - Assume calls are in B and ANY_OF(C,D,E): CSR allocation is cheap
>> everywhere
>> - Assume calls are in C or ALL_OF(D,E): CSR allocation is cheap in
>> ALL_OF(C,D,E); CSR allocation is expensive in ALL_OF(A,B,F)
>> - Assume only call is in ANY_OF(B,D,E): CSR allocation is cheap
>> only
>> in that block, expensive everywhere else
>> I think this construction would give us what we want, but there
>> may be
>> [obvious] counter-examples I haven't thought of.
>> Thanks Nemanja for sharing your idea. I think this might cover the
>> case I was targeting, and we may not need to be limited only in the
>> entry block.
>> However, I bit worry if this cause RegAllc to slow in allocating
>> CSRs.  What if we have a call-site in the early exit path and no
>> call-site in all other blocks. Then, conservative allocation of CSRs
>> might not be a good choice if the reg pressure is high in the all
>> other blocks.  As our first step, would it make sense to limit this
>> only when we detect an early exit? I guess Quentin may have some
>> comment.
>> thanks,
>> Jun
>> On Tue, Oct 31, 2017 at 8:38 PM, via llvm-dev
>> <llvm-dev at lists.llvm.org> wrote:
>> On 2017-10-30 21:20, Hal Finkel wrote:
>> On 10/30/2017 12:20 PM, junbuml at codeaurora.org wrote:
>> On 2017-10-27 19:50, Hal Finkel wrote:
>> On 10/27/2017 03:32 PM, Jun Lim via llvm-dev wrote:
>> When compiling C code below for AArach64, I saw that
>> shrink-wrapping
>> didn't happen due to the very early uses of CSRs in the entry block.
>> So CSR spills/reloads are executed even when the early exit block is
>> taken.
>> int getI(int i);
>> int foo(int *P, int i) {
>> if (i>0)
>> return P[i];
>> i = getI(i);
>> return P[i];
>> }
>> It's not that hard to find such cases where RegAllocGreedy
>> aggressively allocates a CSRs when a live range expands across a
>> call-site.  That's because of the conservatively initialized
>> CSRCost, causing RegAllocGreedy to strongly favour allocating a CSR
>> over splitting a region. Since allocation of CSRs requires the cost
>> of spilling CSRs, allocating CSRs is not always beneficial. Like the
>> case above, if a function has an early exit code, we may want to be
>> less aggressive on the first allocation of CSR in the entry block by
>> increasing the CSRCost.
>> Previously, I proposed https://reviews.llvm.org/D34608 [1] in this
>> matter, but the way I detect the profitable cases and the way I
>> increase the CRSCost was somewhat unclear. Now, I'm thinking to less
>> aggressive on the first allocation of CSR in the entry block in case
>> where the function has an early exit so that encourage more
>> shrink-wrapping and avoid executing CSR spill/recover when the early
>> exit is taken. By sending this out, I just want to get any high
>> level feedback early. Please let me know if anyone has any opinion
>> about this.
>> So the heuristic will have nothing to do with the presence of calls?
>> Might this increase spilling in loops?
>> -Hal
>> Before allowing the first allocation from CSRs, I will check if the
>> virtual register is really live across a call in other blocks. If
>> the
>> function have a call in entry or exit, we don't need to increase the
>> CSRCost. This heuristic will be applied only for the very first
>> allocation from CSRs only in the entry block when the function has
>> an
>> early exit; if a CSR is already allocated in the function, we will
>> use
>> the current default global CSRCost.
>> Even after increasing the CSRCost, we should end up allocating the
>> first CSR if the cost of splitting the live-range is still higher
>> than
>> the increased CSRCost. I believe the amount we want to increase the
>> CSRCost must be target-dependent, but it must be conservative enough
>> to avoid too many copies in the related spots.
>> Thanks for explaining.
>> I suppose you'll want to make sure that the call(s) in question come
>> after the early exit (i.e., that there aren't calls before the early
>> exit)?
>> I will check if a virtual register is live across only calls which
>> will be executed when the early exit is not taken. By increasing
>> CSRcost for such case, we increase chances to avoid executing CSR
>> spill/recover when the early exit is taken.
>> When you say "only in the entry block" you mean that the live range
>> starts in the entry block, right (i.e., that it is a function
>> parameter or a vreg defined by some instruction in the entry block)?
>> Yes.
>> Does it matter that it is in the entry block, or do you only need it
>> to come before the early exit and have an execution frequency <= to
>> the entry block's execution frequency?
>> The reason I specifically check the entry block is because for now I
>> see the early exit happen only in entry block; limiting that the
>> early
>> exit condition is checked in the entry block and branch to the exit
>> block directly from the entry block if hitting the condition. If
>> overall approach is reasonable, we can certainly extend it.
>> -Hal
>> Thanks,
>> Jun
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev [2]
>> -- Hal Finkel
>> Lead, Compiler Technology and Programming Languages
>> Leadership Computing Facility
>> Argonne National Laboratory
>> --
>> Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm
>> Technologies, Inc.
>> Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a
>> Linux Foundation Collaborative Project.
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev [2]
>> Links:
>> ------
>> [1] https://reviews.llvm.org/D34608
>> [2] http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>  --
> Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm
> Technologies, Inc.
> Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a
> Linux Foundation Collaborative Project.
> 
> --
> Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm
> Technologies, Inc.
> Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a
> Linux Foundation Collaborative Project.

-- 
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm 
Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a 
Linux Foundation Collaborative Project.