[PATCH] D25963: [LoopUnroll] Implement profile-based loop peeling

Thu Oct 27 11:52:39 PDT 2016

On Thu, Oct 27, 2016 at 10:02 AM, Michael Kuperstein <mkuper at google.com>
wrote:

> Thanks for the analysis David, more inline.
>
> On Wed, Oct 26, 2016 at 4:43 PM, Xinliang David Li <davidxl at google.com>
> wrote:
>
>>
>>
>> On Wed, Oct 26, 2016 at 3:03 PM, Michael Kuperstein <mkuper at google.com>
>> wrote:
>>
>>>
>>>
>>> On Wed, Oct 26, 2016 at 1:09 PM, David Li <davidxl at google.com> wrote:
>>>
>>>> davidxl added inline comments.
>>>>
>>>>
>>>> ================
>>>> Comment at: lib/Transforms/Utils/LoopUnrollPeel.cpp:101
>>>> +      // We no longer know anything about the branch probability.
>>>> +      LatchBR->setMetadata(LLVMContext::MD_prof, nullptr);
>>>> +    }
>>>> ----------------
>>>> mkuper wrote:
>>>> > davidxl wrote:
>>>> > > Why? I think we should update the branch probability here -- it
>>>> depends on the what iteration of the peeled clone. If peel count <
>>>> average/estimated trip count, then each peeled iteration should be more
>>>> biased towards fall through. If peel_count == est trip_count, then the last
>>>> peel iteration should be biased toward exit.
>>>> > You're right, it's not that we don't know anything - but we don't
>>>> know enough. I'm not sure how to attach a reasonable number to this,
>>>> without knowing the distribution.
>>>> > Do you have any suggestions? The trivial option would be to assume an
>>>> extremely narrow distribution (the loop always exits after exactly K
>>>> iterations), but that would mean having an extreme bias for all of the
>>>> branches, and I'm not sure that's wise.
>>>> A reasonable way to annotate the branch is like this.
>>>> Say the original trip count of the loop is N, then for the m th (from 0
>>>> to N-1) peeled iteration, the fall through probability is a decreasing
>>>> function:
>>>>
>>>> (N - m )/N
>>>>
>>>>
>>> I'm not entirely sure the math works out - because N is the average
>>>
>>
>> Yes -- N is the average -- but this is due to limitation of PGO. To get
>> trip count distribution, we need to do value profiling of loop trip count
>> or have path sensitive profile. This is future work. For now we need to
>> focus on what we have with good heuristics.
>>
>>
> Sure, didn't mean top imply otherwise.
>
>
>> With current PGO, the back branch probability is already estimated to be
>> N/(N+1) which can be inaccurate depending on trip count distribution.
>>
>>
>>
>>> the newly assigned weights ought to have the property that the total
>>> probability of reaching the loop header is 0.5 -
>>>
>>
>>
>> Why should this constraints exist? The constraints that should be
>> satisfied are 1) the total frequency of the loop exit remain unchanged; 2)
>> the total header (including cloned ones) frequency equals the original
>> header frequency 3) the header frequency of the first peeled iteration
>> equals to the original preheader frequency
>>
>>
> You're right, the constraint I suggested is nonsense, it's really
> distribution-dependent.
>
>
>>
>>  The original conditional branch (for loop back edge) have one
>> shared/'average' branch probability for iterations. Once the branch is
>> cloned via peeling, more context (temporal) information is available, the
>> conditional branch probabilities  of those cloned branches can be refined
>> -- the intuition is that the closer the iteration is to the end of the
>> loop, the more likely it is branch to exit.
>>
>
> What bothers me somewhat is that this doesn't hold in general.
> It does hold if the distribution is more or less normal around the average
> - early iterations usually fall through, but the closer we get to N, the
> more often we exit.
> The problem is that it doesn't hold for long-tail distributions (or
> anything else that is biased towards low counts) - which may also be common.
>
> Consider:
> 1 Iterations - 128
> 2 Iterations - 64
> 3 Iterations - 32,
> etc.
>
> In this case, the fall-through probability is always 0.5, regardless of
> iteration number.
> I'm really not sure which of the two cases is more common/natural.
>

In general, assuming the trip count distribution function is f(x), x = 0,
1, ... inf, then the i th peeled iteration's fall through prob should be

SUM(f(n), n = i+1, ...)

In your example, the first peeled iteration should have fall through prob :

(64+32+.....)/(128+64+32+ ....)  =  (Total-128)/Total, where the Total is
the sum of counts of all possible trip counts.

For the second peeled iteration,  Prob_2 = (Total - 128 - 64)/Total

There are two extreme cases ehre: the first is to assume the trip count
distribution is centered around the average trip count with very small
standard deviation, and the second is what you proposed -- uniform
distribution from 0, to 2N

With the first model, fall through branch prob of successive iterations
decrease very slowly until the iteration number is close to the average
trip count N where there should be a sharp drop.

With the uniform distribution model, we get   (2N - m - 1)/(2N).   Without
more precise data, it is probably reasonable to use your model.

Note with the uniform model, the fall through prob for the first iteration
is (N-1)/N, which is basically the same as the original loop's branch prob
which is the average case. When N is small, the probability seems too low
-- but I don't know for sure -- we can always revisit it in the future if
we see problems.

David

>
> Anyway, as I said, I have no real intuition here, so I'm ok with doing it
> either way.
>
>
>>
>>
>>> and I don't think that happens here.
>>>
>>
>>> This also doesn't solve the problem of what probability to assign to the
>>> loop backedge - if K is the random variable signifying the number of
>>> iterations, I think it should be something like 1/(E[K | K > E[K]] - E[K]).
>>> That is, it depends on the expected number of iterations given that we
>>> have more iterations than average. Which we don't know, and we can't even
>>> bound.
>>> E.g. imagine that we have a loop that runs for 1 iteration for a million
>>> times, and a million iterations once. The average number of iterations is
>>> 2, but the probability of taking the backedge, once you've reached the
>>> loop, is extremely high.
>>>
>>
>> We just need to update the existing branch weights data slightly.
>> Ideally, we can first assign branch probabilities for conditional branches
>> of cloned iterations, and then using the constraints I mentioned above to
>> adjust the weight. However I think it can be simplified as follows:
>>
>> Suppose the branch weight vector is (WB, WE) where WB is the weight of
>> edge to loop header, and WE is weight of edge to exit block, then the new
>> weight can be something like (WB - m*WE, WE) where m is the number of
>> peeled iterations.
>>
>>
> I don't think we really need this simplification - it sounds pretty
> straight-forward to track the weights while assigning them to the peeled
> branches.
>
>
>> [Proof]. Assuming the fall_through probabilities of i th cloned cond
>> branch is P_i.    The weight header of the first cloned iteration is WE,
>>  then the total edge weight from cloned iteration to the exit block is
>>
>>   WE *(1-P_1)*(1-P_2)...(1-P_m).
>>
>> so the new exit edge weight of the remaining loop is
>>
>> WE * ( 1 - (1-P_1)*....(1-P_m))
>>
>> Assuming P_i is close to 1, this approximates to WE.
>>
>>
> Even for the narrow normal distribution case, at least P_m won't be close
> to 1. I'm not sure this matters in practice - but since tracking the
> weights doesn't sound hard, I'll try that first.
>
>
>> Similarly, the new header weight of the loop is about (WB - m*WE)
>>
>>
>>
>>>
>>> We could assume something like a uniform distribution between, say, 0
>>> and 2 * N iterations (in which case the fall-through probability is, I
>>> think (2 * N - m - 1) / (2 * N), and the backedge probability is something
>>> like 1 - 1/(1.5 * N) )  - but I don't know if that's realistic either.
>>>
>>
>> I am not sure making such assumption about distribution is a reasonable
>> thing to do.  I think it is more reasonable to assume more narrow
>> distribution and adjust the weight in a simple way (we are not doing
>> anything worse than is already happening today).
>>
>> David
>>
>>
>>
>>>
>>>
>>>> Add some fuzzing factor to avoid creating extremely biased branch prob:
>>>>
>>>> for instance (N-m)*3/(4*N)
>>>>
>>>>
>>>> https://reviews.llvm.org/D25963
>>>>
>>>>
>>>>
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20161027/6511735c/attachment.html>