[PATCH] D25963: [LoopUnroll] Implement profile-based loop peeling

Thu Oct 27 10:02:02 PDT 2016

Thanks for the analysis David, more inline.

On Wed, Oct 26, 2016 at 4:43 PM, Xinliang David Li <davidxl at google.com>
wrote:

>
>
> On Wed, Oct 26, 2016 at 3:03 PM, Michael Kuperstein <mkuper at google.com>
> wrote:
>
>>
>>
>> On Wed, Oct 26, 2016 at 1:09 PM, David Li <davidxl at google.com> wrote:
>>
>>> davidxl added inline comments.
>>>
>>>
>>> ================
>>> Comment at: lib/Transforms/Utils/LoopUnrollPeel.cpp:101
>>> +      // We no longer know anything about the branch probability.
>>> +      LatchBR->setMetadata(LLVMContext::MD_prof, nullptr);
>>> +    }
>>> ----------------
>>> mkuper wrote:
>>> > davidxl wrote:
>>> > > Why? I think we should update the branch probability here -- it
>>> depends on the what iteration of the peeled clone. If peel count <
>>> average/estimated trip count, then each peeled iteration should be more
>>> biased towards fall through. If peel_count == est trip_count, then the last
>>> peel iteration should be biased toward exit.
>>> > You're right, it's not that we don't know anything - but we don't know
>>> enough. I'm not sure how to attach a reasonable number to this, without
>>> knowing the distribution.
>>> > Do you have any suggestions? The trivial option would be to assume an
>>> extremely narrow distribution (the loop always exits after exactly K
>>> iterations), but that would mean having an extreme bias for all of the
>>> branches, and I'm not sure that's wise.
>>> A reasonable way to annotate the branch is like this.
>>> Say the original trip count of the loop is N, then for the m th (from 0
>>> to N-1) peeled iteration, the fall through probability is a decreasing
>>> function:
>>>
>>> (N - m )/N
>>>
>>>
>> I'm not entirely sure the math works out - because N is the average
>>
>
> Yes -- N is the average -- but this is due to limitation of PGO. To get
> trip count distribution, we need to do value profiling of loop trip count
> or have path sensitive profile. This is future work. For now we need to
> focus on what we have with good heuristics.
>
>
Sure, didn't mean top imply otherwise.

> With current PGO, the back branch probability is already estimated to be
> N/(N+1) which can be inaccurate depending on trip count distribution.
>
>
>
>> the newly assigned weights ought to have the property that the total
>> probability of reaching the loop header is 0.5 -
>>
>
>
> Why should this constraints exist? The constraints that should be
> satisfied are 1) the total frequency of the loop exit remain unchanged; 2)
> the total header (including cloned ones) frequency equals the original
> header frequency 3) the header frequency of the first peeled iteration
> equals to the original preheader frequency
>
>
You're right, the constraint I suggested is nonsense, it's really
distribution-dependent.

>
>  The original conditional branch (for loop back edge) have one
> shared/'average' branch probability for iterations. Once the branch is
> cloned via peeling, more context (temporal) information is available, the
> conditional branch probabilities  of those cloned branches can be refined
> -- the intuition is that the closer the iteration is to the end of the
> loop, the more likely it is branch to exit.
>

What bothers me somewhat is that this doesn't hold in general.
It does hold if the distribution is more or less normal around the average
- early iterations usually fall through, but the closer we get to N, the
more often we exit.
The problem is that it doesn't hold for long-tail distributions (or
anything else that is biased towards low counts) - which may also be common.

Consider:
1 Iterations - 128
2 Iterations - 64
3 Iterations - 32,
etc.

In this case, the fall-through probability is always 0.5, regardless of
iteration number.
I'm really not sure which of the two cases is more common/natural.

Anyway, as I said, I have no real intuition here, so I'm ok with doing it
either way.

>
>
>> and I don't think that happens here.
>>
>
>> This also doesn't solve the problem of what probability to assign to the
>> loop backedge - if K is the random variable signifying the number of
>> iterations, I think it should be something like 1/(E[K | K > E[K]] - E[K]).
>> That is, it depends on the expected number of iterations given that we
>> have more iterations than average. Which we don't know, and we can't even
>> bound.
>> E.g. imagine that we have a loop that runs for 1 iteration for a million
>> times, and a million iterations once. The average number of iterations is
>> 2, but the probability of taking the backedge, once you've reached the
>> loop, is extremely high.
>>
>
> We just need to update the existing branch weights data slightly. Ideally,
> we can first assign branch probabilities for conditional branches of cloned
> iterations, and then using the constraints I mentioned above to adjust the
> weight. However I think it can be simplified as follows:
>
> Suppose the branch weight vector is (WB, WE) where WB is the weight of
> edge to loop header, and WE is weight of edge to exit block, then the new
> weight can be something like (WB - m*WE, WE) where m is the number of
> peeled iterations.
>
>
I don't think we really need this simplification - it sounds pretty
straight-forward to track the weights while assigning them to the peeled
branches.

> [Proof]. Assuming the fall_through probabilities of i th cloned cond
> branch is P_i.    The weight header of the first cloned iteration is WE,
>  then the total edge weight from cloned iteration to the exit block is
>
>   WE *(1-P_1)*(1-P_2)...(1-P_m).
>
> so the new exit edge weight of the remaining loop is
>
> WE * ( 1 - (1-P_1)*....(1-P_m))
>
> Assuming P_i is close to 1, this approximates to WE.
>
>
Even for the narrow normal distribution case, at least P_m won't be close
to 1. I'm not sure this matters in practice - but since tracking the
weights doesn't sound hard, I'll try that first.

> Similarly, the new header weight of the loop is about (WB - m*WE)
>
>
>
>>
>> We could assume something like a uniform distribution between, say, 0 and
>> 2 * N iterations (in which case the fall-through probability is, I think (2
>> * N - m - 1) / (2 * N), and the backedge probability is something like 1 -
>> 1/(1.5 * N) )  - but I don't know if that's realistic either.
>>
>
> I am not sure making such assumption about distribution is a reasonable
> thing to do.  I think it is more reasonable to assume more narrow
> distribution and adjust the weight in a simple way (we are not doing
> anything worse than is already happening today).
>
> David
>
>
>
>>
>>
>>> Add some fuzzing factor to avoid creating extremely biased branch prob:
>>>
>>> for instance (N-m)*3/(4*N)
>>>
>>>
>>> https://reviews.llvm.org/D25963
>>>
>>>
>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20161027/f137e47c/attachment-0001.html>