[PATCH] D25963: [LoopUnroll] Implement profile-based loop peeling

Wed Oct 26 16:43:00 PDT 2016

On Wed, Oct 26, 2016 at 3:03 PM, Michael Kuperstein <mkuper at google.com>
wrote:

>
>
> On Wed, Oct 26, 2016 at 1:09 PM, David Li <davidxl at google.com> wrote:
>
>> davidxl added inline comments.
>>
>>
>> ================
>> Comment at: lib/Transforms/Utils/LoopUnrollPeel.cpp:101
>> +      // We no longer know anything about the branch probability.
>> +      LatchBR->setMetadata(LLVMContext::MD_prof, nullptr);
>> +    }
>> ----------------
>> mkuper wrote:
>> > davidxl wrote:
>> > > Why? I think we should update the branch probability here -- it
>> depends on the what iteration of the peeled clone. If peel count <
>> average/estimated trip count, then each peeled iteration should be more
>> biased towards fall through. If peel_count == est trip_count, then the last
>> peel iteration should be biased toward exit.
>> > You're right, it's not that we don't know anything - but we don't know
>> enough. I'm not sure how to attach a reasonable number to this, without
>> knowing the distribution.
>> > Do you have any suggestions? The trivial option would be to assume an
>> extremely narrow distribution (the loop always exits after exactly K
>> iterations), but that would mean having an extreme bias for all of the
>> branches, and I'm not sure that's wise.
>> A reasonable way to annotate the branch is like this.
>> Say the original trip count of the loop is N, then for the m th (from 0
>> to N-1) peeled iteration, the fall through probability is a decreasing
>> function:
>>
>> (N - m )/N
>>
>>
> I'm not entirely sure the math works out - because N is the average
>

Yes -- N is the average -- but this is due to limitation of PGO. To get
trip count distribution, we need to do value profiling of loop trip count
or have path sensitive profile. This is future work. For now we need to
focus on what we have with good heuristics.

With current PGO, the back branch probability is already estimated to be
N/(N+1) which can be inaccurate depending on trip count distribution.

> the newly assigned weights ought to have the property that the total
> probability of reaching the loop header is 0.5 -
>

Why should this constraints exist? The constraints that should be satisfied
are 1) the total frequency of the loop exit remain unchanged; 2) the total
header (including cloned ones) frequency equals the original header
frequency 3) the header frequency of the first peeled iteration equals to
the original preheader frequency

 The original conditional branch (for loop back edge) have one
shared/'average' branch probability for iterations. Once the branch is
cloned via peeling, more context (temporal) information is available, the
conditional branch probabilities  of those cloned branches can be refined
-- the intuition is that the closer the iteration is to the end of the
loop, the more likely it is branch to exit.

> and I don't think that happens here.
>

> This also doesn't solve the problem of what probability to assign to the
> loop backedge - if K is the random variable signifying the number of
> iterations, I think it should be something like 1/(E[K | K > E[K]] - E[K]).
> That is, it depends on the expected number of iterations given that we
> have more iterations than average. Which we don't know, and we can't even
> bound.
> E.g. imagine that we have a loop that runs for 1 iteration for a million
> times, and a million iterations once. The average number of iterations is
> 2, but the probability of taking the backedge, once you've reached the
> loop, is extremely high.
>

We just need to update the existing branch weights data slightly. Ideally,
we can first assign branch probabilities for conditional branches of cloned
iterations, and then using the constraints I mentioned above to adjust the
weight. However I think it can be simplified as follows:

Suppose the branch weight vector is (WB, WE) where WB is the weight of edge
to loop header, and WE is weight of edge to exit block, then the new weight
can be something like (WB - m*WE, WE) where m is the number of peeled
iterations.

[Proof]. Assuming the fall_through probabilities of i th cloned cond branch
is P_i.    The weight header of the first cloned iteration is WE,  then the
total edge weight from cloned iteration to the exit block is

  WE *(1-P_1)*(1-P_2)...(1-P_m).

so the new exit edge weight of the remaining loop is

WE * ( 1 - (1-P_1)*....(1-P_m))

Assuming P_i is close to 1, this approximates to WE.

Similarly, the new header weight of the loop is about (WB - m*WE)

>
> We could assume something like a uniform distribution between, say, 0 and
> 2 * N iterations (in which case the fall-through probability is, I think (2
> * N - m - 1) / (2 * N), and the backedge probability is something like 1 -
> 1/(1.5 * N) )  - but I don't know if that's realistic either.
>

I am not sure making such assumption about distribution is a reasonable
thing to do.  I think it is more reasonable to assume more narrow
distribution and adjust the weight in a simple way (we are not doing
anything worse than is already happening today).

David

>
>
>> Add some fuzzing factor to avoid creating extremely biased branch prob:
>>
>> for instance (N-m)*3/(4*N)
>>
>>
>> https://reviews.llvm.org/D25963
>>
>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20161026/0d6d1a57/attachment.html>