[PATCH] D28593: Update loop branch_weight metadata after loop rotation.

Thu Jan 12 16:09:16 PST 2017

On Thu, Jan 12, 2017 at 3:37 PM, Michael Kuperstein <mkuper at google.com> wrote:
>
>
> On Thu, Jan 12, 2017 at 2:55 PM, Xin Tong <trent.xin.tong at gmail.com> wrote:
>>
>> On Thu, Jan 12, 2017 at 2:40 PM, Dehao Chen <danielcdh at gmail.com> wrote:
>> > Are you suggesting changing it to:
>> >
>> > rotate_update_probability(float orig_prob) {
>> >   if (orig_prob >= 0 && orig_prob < 0.5) {
>> >     branch_prob_inside = 0;
>> >     branch_prob_outside = orig_prob;
>> >   } else {
>> >     branch_prob_inside = 2 - 1/orig_prob;
>> >     branch_prob_outside = 1;
>> >   }
>> > }
>>
>> This would not work for the case which the orig_prob is bigger than
>> 0.5. branch_prob_outside tells me when the guard block is hit, it will
>> always go to the exit.
>> >
>> > int estimate_trip_count(float latch_probability) {
>> >    if (latch_probability == 0)
>> >      return 0;
>> >    else
>> >      return 1/(1-latch_probability)
>> > }
>> >
>> > My gut feeling tells me this is a little dangerous. Micheal, what do you
>> > think?
>> >
>> > Dehao
>>
>> My general feeling is.
>>
>> 1. We do not have all the information to do a very precise
>> distribution of the weight, and we will unlikely to be able to get
>> such information in short term (we need value profiling at least I
>> think or some other more clever approach). we made a decision to use
>> average trip count for loop peeling, assuming the loop tripcounts are
>> not polarized. Loop peeling will probably not be as effective if it
>> is.  So we make the same assumption when we rotate the loops and
>> update the !prof metadata.
>
>
> I agree.
>
>>
>> 2. We also consider the possibility of this assumption falling out, in
>> the case of loop peel, we basically peeled for some gains when the
>> iteration has a high trip count and some loss  (increase code size)
>> when the iterations has low trip count. I feel this is acceptable.  In
>> case of metadata update in loop rotation, if loop does have a very
>> polarized distribution. i.e. cases where a few iterations takes all
>> the trip counts and rest have close to zero. We basically
>> underestimate the weight of the branch from guard block to exit block,
>> as there should be more iterations of the loop that exits before even
>> one iteration is executed. This will propagate into the CFG and result
>> in us underestimating the average tripcount of the loops when it
>> actually executes (i.e. we have fewer iterations responsible for all
>> these body weight). This is OK, we become less aggressive in peeling,
>> we left some performance opportunities on the table, but only if the
>> loop distribution is polarized. And we may do a slightly worse job in
>> block placement.
>
>
> That's not exactly right. Underestimating can make us more aggressive, not
> less.
> Because we use a hard threshold for when to peel, underestimating the trip
> count
> can cause us to be more aggressive. I'm not sure this is a big problem,
> because
> if the threshold is chosen correctly, choosing to peel or not to peel when
> we're very
> close to the threshold should be a wash.
> Admittedly, that's a pretty big if. :-) But I don't think it's a huge
> problem.

If the actual tripcount of the loop is 10 and we estimated to be 5, we
will peel it 5 iterations.
Its only when the
https://github.com/llvm-mirror/llvm/blob/master/lib/Transforms/Utils/LoopUnrollPeel.cpp#L95
is false,  we do not peel, then if the underestimation breaks this
condition, the real number (bigger) will sure not cause the loop to be
peeled.

i.e. we will never peel more than this threshold. so I feel we are
leaving opportunities on the table, instead of being too
aggressive.

-Xin
>
> Dehao, why do you think this is dangerous? It's certainly somewhat ugly,
> but that's a separate issue.
>
>>
>> 3. In case the loop distribution is close to each other, the patch
>> handles it and distributes the weight in a reasonable way I feel.
>>
>> -Xin
>> >
>> > On Thu, Jan 12, 2017 at 2:33 PM, Xin Tong <trent.xin.tong at gmail.com>
>> > wrote:
>> >>
>> >> On Thu, Jan 12, 2017 at 2:19 PM, Dehao Chen <danielcdh at gmail.com>
>> >> wrote:
>> >> >
>> >> >
>> >> > On Thu, Jan 12, 2017 at 2:14 PM, Xin Tong <trent.xin.tong at gmail.com>
>> >> > wrote:
>> >> >>
>> >> >> On Thu, Jan 12, 2017 at 2:10 PM, Dehao Chen <danielcdh at gmail.com>
>> >> >> wrote:
>> >> >> >
>> >> >> >
>> >> >> > On Thu, Jan 12, 2017 at 2:02 PM, Xin Tong
>> >> >> > <trent.xin.tong at gmail.com>
>> >> >> > wrote:
>> >> >> >>
>> >> >> >> On Thu, Jan 12, 2017 at 1:34 PM, Dehao Chen <danielcdh at gmail.com>
>> >> >> >> wrote:
>> >> >> >> > If we assume that the trip count will always be the same, then
>> >> >> >> > if
>> >> >> >> > back
>> >> >> >> > edge
>> >> >> >> > taken probability falls in the range of (0, 0.5), then it
>> >> >> >> > already
>> >> >> >> > broke
>> >> >> >> > this
>> >> >> >> > assumption. So we do not have enough info to infer both branch
>> >> >> >> > probabilities. So probably we should simply set the inner
>> >> >> >> > probability
>> >> >> >> > as
>> >> >> >> > "0". And when we estimate loop trip count, we will always
>> >> >> >> > return 0
>> >> >> >> > if
>> >> >> >> > the
>> >> >> >> > latch probability is 0. Something like:
>> >> >> >> >
>> >> >> >> > rotate_update_probability(float orig_prob) {
>> >> >> >> >   branch_prob_outside = orig_prob;
>> >> >> >> >   if (orig_prob >= 0 && orig_prob < 0.5) {
>> >> >> >> >     branch_prob_inside = 0;
>> >> >> >> >   } else {
>> >> >> >> >     branch_prob_inside = 2 - 1/orig_prob;
>> >> >> >> >   }
>> >> >> >> > }
>> >> >> >> >
>> >> >> >> > int estimate_trip_count(float latch_probability) {
>> >> >> >> >    if (latch_probability == 0)
>> >> >> >> >      return 0;
>> >> >> >> >    else
>> >> >> >> >      return 1/(1-latch_probability)
>> >> >> >> > }
>> >> >> >>
>> >> >> >> We assume the trip count will be close to each other, they do not
>> >> >> >> have
>> >> >> >> to be identical =). And that assumption has implications, I
>> >> >> >> addressed
>> >> >> >> the implications when we have polarized loop tripcounts in an
>> >> >> >> earlier
>> >> >> >> response. I feel we need to make assumptions and also reason
>> >> >> >> about
>> >> >> >> the
>> >> >> >> fallouts if that assumption turns out to be not true as Michael
>> >> >> >> said.
>> >> >> >
>> >> >> >
>> >> >> > Yes, the above code is trying to address one case where the
>> >> >> > assumption
>> >> >> > is
>> >> >> > untrue.
>> >> >> >
>> >> >> >>
>> >> >> >>
>> >> >> >> Back to your example, I am not sure i understand it fully, is
>> >> >> >> orig_prob the probability of branching outside before rotation ?
>> >> >> >
>> >> >> >
>> >> >> > Yes, orig_prob is the original probability of the branch before
>> >> >> > loop
>> >> >> > rotation.
>> >> >>
>> >> >> If orig_prob == 0.5, the branch_prob_inside is 0.
>> >> >> (branch_prob_inside
>> >> >> is the probability that the loop body is branched to before rotation
>> >> >> ?).
>> >> >
>> >> >
>> >> > Sorry about the confusion. Let me use my original example:
>> >> >
>> >> > Before loop rotation
>> >> >
>> >> > L1:
>> >> > if (cond) { // "orig_prob"
>> >> >   stmt;
>> >> >   goto L1;
>> >> > }
>> >> >
>> >> > After loop rotation:
>> >> >
>> >> > if (cond) { // "branch_prob_outside"
>> >> > L1:
>> >> >   stmt;
>> >> >   if (cond) { // "branch_prob_inside" // and also "latch_probability"
>> >> >     goto L1;
>> >> >   }
>> >> > }
>> >> >
>> >> > Note that all probability are for when "cond" is true.
>> >> >
>> >> > Thanks,
>> >> > Dehao
>> >>
>> >> Thanks for the code. One thing I notice is that branch_prob_outside
>> >> probably should not be orig_prob. orig_prob before rotation is the (#
>> >> of times the exit is taken)/(# of times the header branch is hit
>> >> through forward and back edges). So if the loop has a cmp i32 %index,
>> >> 127 and recurrence of +1 (index = index + 1), i.e. the loop has a trip
>> >> count of 128, then branch_prob_outside = orig_prob = 1/128.  However,
>> >> branch_prob_outside should really be 0 as the loop will always
>> >> execute, i..e branch_prob_outside in the guard block will never be
>> >> taken.
>> >> >
>> >> >
>> >> >>
>> >> >> Thanks,
>> >> >> -Xin
>> >> >> >
>> >> >> >>
>> >> >> >>
>> >> >> >> Thanks,
>> >> >> >> -Xin
>> >> >> >>
>> >> >> >> >
>> >> >> >> >
>> >> >> >> > On Thu, Jan 12, 2017 at 12:59 PM, Michael Kuperstein via
>> >> >> >> > Phabricator
>> >> >> >> > <reviews at reviews.llvm.org> wrote:
>> >> >> >> >>
>> >> >> >> >> mkuper added inline comments.
>> >> >> >> >>
>> >> >> >> >>
>> >> >> >> >> ================
>> >> >> >> >> Comment at: lib/Transforms/Scalar/LoopRotation.cpp:476
>> >> >> >> >> +  // data.
>> >> >> >> >> +  if (!updateLoopEstimatedBranchWeight(L, GBI, BI /*
>> >> >> >> >> OrigHeader
>> >> >> >> >> BR
>> >> >> >> >> */)) {
>> >> >> >> >> +    BI->setMetadata(LLVMContext::MD_prof, nullptr);
>> >> >> >> >> ----------------
>> >> >> >> >> Also, regardless of the rest of the discussion - I don't think
>> >> >> >> >> we
>> >> >> >> >> should
>> >> >> >> >> drop the metadata on the floor if we fail.
>> >> >> >> >>
>> >> >> >> >> I don't think "No data is better than imprecise data" is right
>> >> >> >> >> in
>> >> >> >> >> the
>> >> >> >> >> general case, but that's arguable. Specifically here, though,
>> >> >> >> >> we're
>> >> >> >> >> imprecise even if updateLoopEstimatedBranchWeight() succeeds,
>> >> >> >> >> because
>> >> >> >> >> of the
>> >> >> >> >> assumptions we make on the distribution.
>> >> >> >> >>
>> >> >> >> >>
>> >> >> >> >> https://reviews.llvm.org/D28593
>> >> >> >> >>
>> >> >> >> >>
>> >> >> >> >>
>> >> >> >> >
>> >> >> >
>> >> >> >
>> >> >
>> >> >
>> >
>> >
>
>