[PATCH] D28593: Update loop branch_weight metadata after loop rotation.

Thu Jan 12 16:23:53 PST 2017

On Thu, Jan 12, 2017 at 4:21 PM, Michael Kuperstein <mkuper at google.com> wrote:
> Peeling just the threshold in general doesn't sound appealing (sorry :-) )
> Let's say you have a loop with 1000 iterations, and the thresholds says you
> should peel the first 10. And let's say peeling those 10 off is really
> beneficial, because we can simplify the code in those 10. The means we
> improved the performance of  1/100 of the actual dynamic iterations (first
> 10 out of 100), but blew up the code size by a factor of 10. Not a good
> trade off.

Make sense.
>
> We could have a "roll-off" instead of a hard threshold, but I think we'd
> need some evidence this is a real problem to justify that.
>
> On Thu, Jan 12, 2017 at 4:18 PM, Xin Tong <trent.xin.tong at gmail.com> wrote:
>>
>> So there are really 3 cases.
>>
>> case 1: est < real < threshold => we should peel more.
>> case 2: est < threshold < real => we would not peel with the real
>> tripcount, but we peeled with the underestimated tripcount. we are
>> being aggressive here. Michael, have you considered the possibility of
>> peeling just the threshold ?
>> case 3: threshold < est < real => we do not peel. this is fine.
>> -Xin
>>
>> On Thu, Jan 12, 2017 at 4:14 PM, Xin Tong <trent.xin.tong at gmail.com>
>> wrote:
>> > On Thu, Jan 12, 2017 at 4:09 PM, Xin Tong <trent.xin.tong at gmail.com>
>> > wrote:
>> >> On Thu, Jan 12, 2017 at 3:37 PM, Michael Kuperstein <mkuper at google.com>
>> >> wrote:
>> >>>
>> >>>
>> >>> On Thu, Jan 12, 2017 at 2:55 PM, Xin Tong <trent.xin.tong at gmail.com>
>> >>> wrote:
>> >>>>
>> >>>> On Thu, Jan 12, 2017 at 2:40 PM, Dehao Chen <danielcdh at gmail.com>
>> >>>> wrote:
>> >>>> > Are you suggesting changing it to:
>> >>>> >
>> >>>> > rotate_update_probability(float orig_prob) {
>> >>>> >   if (orig_prob >= 0 && orig_prob < 0.5) {
>> >>>> >     branch_prob_inside = 0;
>> >>>> >     branch_prob_outside = orig_prob;
>> >>>> >   } else {
>> >>>> >     branch_prob_inside = 2 - 1/orig_prob;
>> >>>> >     branch_prob_outside = 1;
>> >>>> >   }
>> >>>> > }
>> >>>>
>> >>>> This would not work for the case which the orig_prob is bigger than
>> >>>> 0.5. branch_prob_outside tells me when the guard block is hit, it
>> >>>> will
>> >>>> always go to the exit.
>> >>>> >
>> >>>> > int estimate_trip_count(float latch_probability) {
>> >>>> >    if (latch_probability == 0)
>> >>>> >      return 0;
>> >>>> >    else
>> >>>> >      return 1/(1-latch_probability)
>> >>>> > }
>> >>>> >
>> >>>> > My gut feeling tells me this is a little dangerous. Micheal, what
>> >>>> > do you
>> >>>> > think?
>> >>>> >
>> >>>> > Dehao
>> >>>>
>> >>>> My general feeling is.
>> >>>>
>> >>>> 1. We do not have all the information to do a very precise
>> >>>> distribution of the weight, and we will unlikely to be able to get
>> >>>> such information in short term (we need value profiling at least I
>> >>>> think or some other more clever approach). we made a decision to use
>> >>>> average trip count for loop peeling, assuming the loop tripcounts are
>> >>>> not polarized. Loop peeling will probably not be as effective if it
>> >>>> is.  So we make the same assumption when we rotate the loops and
>> >>>> update the !prof metadata.
>> >>>
>> >>>
>> >>> I agree.
>> >>>
>> >>>>
>> >>>> 2. We also consider the possibility of this assumption falling out,
>> >>>> in
>> >>>> the case of loop peel, we basically peeled for some gains when the
>> >>>> iteration has a high trip count and some loss  (increase code size)
>> >>>> when the iterations has low trip count. I feel this is acceptable.
>> >>>> In
>> >>>> case of metadata update in loop rotation, if loop does have a very
>> >>>> polarized distribution. i.e. cases where a few iterations takes all
>> >>>> the trip counts and rest have close to zero. We basically
>> >>>> underestimate the weight of the branch from guard block to exit
>> >>>> block,
>> >>>> as there should be more iterations of the loop that exits before even
>> >>>> one iteration is executed. This will propagate into the CFG and
>> >>>> result
>> >>>> in us underestimating the average tripcount of the loops when it
>> >>>> actually executes (i.e. we have fewer iterations responsible for all
>> >>>> these body weight). This is OK, we become less aggressive in peeling,
>> >>>> we left some performance opportunities on the table, but only if the
>> >>>> loop distribution is polarized. And we may do a slightly worse job in
>> >>>> block placement.
>> >>>
>> >>>
>> >>> That's not exactly right. Underestimating can make us more aggressive,
>> >>> not
>> >>> less.
>> >>> Because we use a hard threshold for when to peel, underestimating the
>> >>> trip
>> >>> count
>> >>> can cause us to be more aggressive. I'm not sure this is a big
>> >>> problem,
>> >>> because
>> >>> if the threshold is chosen correctly, choosing to peel or not to peel
>> >>> when
>> >>> we're very
>> >>> close to the threshold should be a wash.
>> >>> Admittedly, that's a pretty big if. :-) But I don't think it's a huge
>> >>> problem.
>> >>
>> >> If the actual tripcount of the loop is 10 and we estimated to be 5, we
>> >> will peel it 5 iterations.
>> >> Its only when the
>> >>
>> >> https://github.com/llvm-mirror/llvm/blob/master/lib/Transforms/Utils/LoopUnrollPeel.cpp#L95
>> >> is false,  we do not peel, then if the underestimation breaks this
>> >> condition, the real number (bigger) will sure not cause the loop to be
>> >> peeled.
>> >>
>> >> i.e. we will never peel more than this threshold. so I feel we are
>> >> leaving opportunities on the table, instead of being too
>> >> aggressive.
>> >>
>> >> -Xin
>> >
>> > Ok, there is also the possiblity that we underestimate and peel a loop
>> > which we would not peel given the right tripcount.  I think thats what
>> > you meant.
>> > -Xin
>> >>>
>> >>> Dehao, why do you think this is dangerous? It's certainly somewhat
>> >>> ugly,
>> >>> but that's a separate issue.
>> >>>
>> >>>>
>> >>>> 3. In case the loop distribution is close to each other, the patch
>> >>>> handles it and distributes the weight in a reasonable way I feel.
>> >>>>
>> >>>> -Xin
>> >>>> >
>> >>>> > On Thu, Jan 12, 2017 at 2:33 PM, Xin Tong
>> >>>> > <trent.xin.tong at gmail.com>
>> >>>> > wrote:
>> >>>> >>
>> >>>> >> On Thu, Jan 12, 2017 at 2:19 PM, Dehao Chen <danielcdh at gmail.com>
>> >>>> >> wrote:
>> >>>> >> >
>> >>>> >> >
>> >>>> >> > On Thu, Jan 12, 2017 at 2:14 PM, Xin Tong
>> >>>> >> > <trent.xin.tong at gmail.com>
>> >>>> >> > wrote:
>> >>>> >> >>
>> >>>> >> >> On Thu, Jan 12, 2017 at 2:10 PM, Dehao Chen
>> >>>> >> >> <danielcdh at gmail.com>
>> >>>> >> >> wrote:
>> >>>> >> >> >
>> >>>> >> >> >
>> >>>> >> >> > On Thu, Jan 12, 2017 at 2:02 PM, Xin Tong
>> >>>> >> >> > <trent.xin.tong at gmail.com>
>> >>>> >> >> > wrote:
>> >>>> >> >> >>
>> >>>> >> >> >> On Thu, Jan 12, 2017 at 1:34 PM, Dehao Chen
>> >>>> >> >> >> <danielcdh at gmail.com>
>> >>>> >> >> >> wrote:
>> >>>> >> >> >> > If we assume that the trip count will always be the same,
>> >>>> >> >> >> > then
>> >>>> >> >> >> > if
>> >>>> >> >> >> > back
>> >>>> >> >> >> > edge
>> >>>> >> >> >> > taken probability falls in the range of (0, 0.5), then it
>> >>>> >> >> >> > already
>> >>>> >> >> >> > broke
>> >>>> >> >> >> > this
>> >>>> >> >> >> > assumption. So we do not have enough info to infer both
>> >>>> >> >> >> > branch
>> >>>> >> >> >> > probabilities. So probably we should simply set the inner
>> >>>> >> >> >> > probability
>> >>>> >> >> >> > as
>> >>>> >> >> >> > "0". And when we estimate loop trip count, we will always
>> >>>> >> >> >> > return 0
>> >>>> >> >> >> > if
>> >>>> >> >> >> > the
>> >>>> >> >> >> > latch probability is 0. Something like:
>> >>>> >> >> >> >
>> >>>> >> >> >> > rotate_update_probability(float orig_prob) {
>> >>>> >> >> >> >   branch_prob_outside = orig_prob;
>> >>>> >> >> >> >   if (orig_prob >= 0 && orig_prob < 0.5) {
>> >>>> >> >> >> >     branch_prob_inside = 0;
>> >>>> >> >> >> >   } else {
>> >>>> >> >> >> >     branch_prob_inside = 2 - 1/orig_prob;
>> >>>> >> >> >> >   }
>> >>>> >> >> >> > }
>> >>>> >> >> >> >
>> >>>> >> >> >> > int estimate_trip_count(float latch_probability) {
>> >>>> >> >> >> >    if (latch_probability == 0)
>> >>>> >> >> >> >      return 0;
>> >>>> >> >> >> >    else
>> >>>> >> >> >> >      return 1/(1-latch_probability)
>> >>>> >> >> >> > }
>> >>>> >> >> >>
>> >>>> >> >> >> We assume the trip count will be close to each other, they
>> >>>> >> >> >> do not
>> >>>> >> >> >> have
>> >>>> >> >> >> to be identical =). And that assumption has implications, I
>> >>>> >> >> >> addressed
>> >>>> >> >> >> the implications when we have polarized loop tripcounts in
>> >>>> >> >> >> an
>> >>>> >> >> >> earlier
>> >>>> >> >> >> response. I feel we need to make assumptions and also reason
>> >>>> >> >> >> about
>> >>>> >> >> >> the
>> >>>> >> >> >> fallouts if that assumption turns out to be not true as
>> >>>> >> >> >> Michael
>> >>>> >> >> >> said.
>> >>>> >> >> >
>> >>>> >> >> >
>> >>>> >> >> > Yes, the above code is trying to address one case where the
>> >>>> >> >> > assumption
>> >>>> >> >> > is
>> >>>> >> >> > untrue.
>> >>>> >> >> >
>> >>>> >> >> >>
>> >>>> >> >> >>
>> >>>> >> >> >> Back to your example, I am not sure i understand it fully,
>> >>>> >> >> >> is
>> >>>> >> >> >> orig_prob the probability of branching outside before
>> >>>> >> >> >> rotation ?
>> >>>> >> >> >
>> >>>> >> >> >
>> >>>> >> >> > Yes, orig_prob is the original probability of the branch
>> >>>> >> >> > before
>> >>>> >> >> > loop
>> >>>> >> >> > rotation.
>> >>>> >> >>
>> >>>> >> >> If orig_prob == 0.5, the branch_prob_inside is 0.
>> >>>> >> >> (branch_prob_inside
>> >>>> >> >> is the probability that the loop body is branched to before
>> >>>> >> >> rotation
>> >>>> >> >> ?).
>> >>>> >> >
>> >>>> >> >
>> >>>> >> > Sorry about the confusion. Let me use my original example:
>> >>>> >> >
>> >>>> >> > Before loop rotation
>> >>>> >> >
>> >>>> >> > L1:
>> >>>> >> > if (cond) { // "orig_prob"
>> >>>> >> >   stmt;
>> >>>> >> >   goto L1;
>> >>>> >> > }
>> >>>> >> >
>> >>>> >> > After loop rotation:
>> >>>> >> >
>> >>>> >> > if (cond) { // "branch_prob_outside"
>> >>>> >> > L1:
>> >>>> >> >   stmt;
>> >>>> >> >   if (cond) { // "branch_prob_inside" // and also
>> >>>> >> > "latch_probability"
>> >>>> >> >     goto L1;
>> >>>> >> >   }
>> >>>> >> > }
>> >>>> >> >
>> >>>> >> > Note that all probability are for when "cond" is true.
>> >>>> >> >
>> >>>> >> > Thanks,
>> >>>> >> > Dehao
>> >>>> >>
>> >>>> >> Thanks for the code. One thing I notice is that
>> >>>> >> branch_prob_outside
>> >>>> >> probably should not be orig_prob. orig_prob before rotation is the
>> >>>> >> (#
>> >>>> >> of times the exit is taken)/(# of times the header branch is hit
>> >>>> >> through forward and back edges). So if the loop has a cmp i32
>> >>>> >> %index,
>> >>>> >> 127 and recurrence of +1 (index = index + 1), i.e. the loop has a
>> >>>> >> trip
>> >>>> >> count of 128, then branch_prob_outside = orig_prob = 1/128.
>> >>>> >> However,
>> >>>> >> branch_prob_outside should really be 0 as the loop will always
>> >>>> >> execute, i..e branch_prob_outside in the guard block will never be
>> >>>> >> taken.
>> >>>> >> >
>> >>>> >> >
>> >>>> >> >>
>> >>>> >> >> Thanks,
>> >>>> >> >> -Xin
>> >>>> >> >> >
>> >>>> >> >> >>
>> >>>> >> >> >>
>> >>>> >> >> >> Thanks,
>> >>>> >> >> >> -Xin
>> >>>> >> >> >>
>> >>>> >> >> >> >
>> >>>> >> >> >> >
>> >>>> >> >> >> > On Thu, Jan 12, 2017 at 12:59 PM, Michael Kuperstein via
>> >>>> >> >> >> > Phabricator
>> >>>> >> >> >> > <reviews at reviews.llvm.org> wrote:
>> >>>> >> >> >> >>
>> >>>> >> >> >> >> mkuper added inline comments.
>> >>>> >> >> >> >>
>> >>>> >> >> >> >>
>> >>>> >> >> >> >> ================
>> >>>> >> >> >> >> Comment at: lib/Transforms/Scalar/LoopRotation.cpp:476
>> >>>> >> >> >> >> +  // data.
>> >>>> >> >> >> >> +  if (!updateLoopEstimatedBranchWeight(L, GBI, BI /*
>> >>>> >> >> >> >> OrigHeader
>> >>>> >> >> >> >> BR
>> >>>> >> >> >> >> */)) {
>> >>>> >> >> >> >> +    BI->setMetadata(LLVMContext::MD_prof, nullptr);
>> >>>> >> >> >> >> ----------------
>> >>>> >> >> >> >> Also, regardless of the rest of the discussion - I don't
>> >>>> >> >> >> >> think
>> >>>> >> >> >> >> we
>> >>>> >> >> >> >> should
>> >>>> >> >> >> >> drop the metadata on the floor if we fail.
>> >>>> >> >> >> >>
>> >>>> >> >> >> >> I don't think "No data is better than imprecise data" is
>> >>>> >> >> >> >> right
>> >>>> >> >> >> >> in
>> >>>> >> >> >> >> the
>> >>>> >> >> >> >> general case, but that's arguable. Specifically here,
>> >>>> >> >> >> >> though,
>> >>>> >> >> >> >> we're
>> >>>> >> >> >> >> imprecise even if updateLoopEstimatedBranchWeight()
>> >>>> >> >> >> >> succeeds,
>> >>>> >> >> >> >> because
>> >>>> >> >> >> >> of the
>> >>>> >> >> >> >> assumptions we make on the distribution.
>> >>>> >> >> >> >>
>> >>>> >> >> >> >>
>> >>>> >> >> >> >> https://reviews.llvm.org/D28593
>> >>>> >> >> >> >>
>> >>>> >> >> >> >>
>> >>>> >> >> >> >>
>> >>>> >> >> >> >
>> >>>> >> >> >
>> >>>> >> >> >
>> >>>> >> >
>> >>>> >> >
>> >>>> >
>> >>>> >
>> >>>
>> >>>
>
>