[PATCH] D28593: Update loop branch_weight metadata after loop rotation.

Thu Jan 12 16:21:58 PST 2017

Peeling just the threshold in general doesn't sound appealing (sorry :-) )
Let's say you have a loop with 1000 iterations, and the thresholds says you
should peel the first 10. And let's say peeling those 10 off is really
beneficial, because we can simplify the code in those 10. The means we
improved the performance of  1/100 of the actual dynamic iterations (first
10 out of 100), but blew up the code size by a factor of 10. Not a good
trade off.

We could have a "roll-off" instead of a hard threshold, but I think we'd
need some evidence this is a real problem to justify that.

On Thu, Jan 12, 2017 at 4:18 PM, Xin Tong <trent.xin.tong at gmail.com> wrote:

> So there are really 3 cases.
>
> case 1: est < real < threshold => we should peel more.
> case 2: est < threshold < real => we would not peel with the real
> tripcount, but we peeled with the underestimated tripcount. we are
> being aggressive here. Michael, have you considered the possibility of
> peeling just the threshold ?
> case 3: threshold < est < real => we do not peel. this is fine.
> -Xin
>
> On Thu, Jan 12, 2017 at 4:14 PM, Xin Tong <trent.xin.tong at gmail.com>
> wrote:
> > On Thu, Jan 12, 2017 at 4:09 PM, Xin Tong <trent.xin.tong at gmail.com>
> wrote:
> >> On Thu, Jan 12, 2017 at 3:37 PM, Michael Kuperstein <mkuper at google.com>
> wrote:
> >>>
> >>>
> >>> On Thu, Jan 12, 2017 at 2:55 PM, Xin Tong <trent.xin.tong at gmail.com>
> wrote:
> >>>>
> >>>> On Thu, Jan 12, 2017 at 2:40 PM, Dehao Chen <danielcdh at gmail.com>
> wrote:
> >>>> > Are you suggesting changing it to:
> >>>> >
> >>>> > rotate_update_probability(float orig_prob) {
> >>>> >   if (orig_prob >= 0 && orig_prob < 0.5) {
> >>>> >     branch_prob_inside = 0;
> >>>> >     branch_prob_outside = orig_prob;
> >>>> >   } else {
> >>>> >     branch_prob_inside = 2 - 1/orig_prob;
> >>>> >     branch_prob_outside = 1;
> >>>> >   }
> >>>> > }
> >>>>
> >>>> This would not work for the case which the orig_prob is bigger than
> >>>> 0.5. branch_prob_outside tells me when the guard block is hit, it will
> >>>> always go to the exit.
> >>>> >
> >>>> > int estimate_trip_count(float latch_probability) {
> >>>> >    if (latch_probability == 0)
> >>>> >      return 0;
> >>>> >    else
> >>>> >      return 1/(1-latch_probability)
> >>>> > }
> >>>> >
> >>>> > My gut feeling tells me this is a little dangerous. Micheal, what
> do you
> >>>> > think?
> >>>> >
> >>>> > Dehao
> >>>>
> >>>> My general feeling is.
> >>>>
> >>>> 1. We do not have all the information to do a very precise
> >>>> distribution of the weight, and we will unlikely to be able to get
> >>>> such information in short term (we need value profiling at least I
> >>>> think or some other more clever approach). we made a decision to use
> >>>> average trip count for loop peeling, assuming the loop tripcounts are
> >>>> not polarized. Loop peeling will probably not be as effective if it
> >>>> is.  So we make the same assumption when we rotate the loops and
> >>>> update the !prof metadata.
> >>>
> >>>
> >>> I agree.
> >>>
> >>>>
> >>>> 2. We also consider the possibility of this assumption falling out, in
> >>>> the case of loop peel, we basically peeled for some gains when the
> >>>> iteration has a high trip count and some loss  (increase code size)
> >>>> when the iterations has low trip count. I feel this is acceptable.  In
> >>>> case of metadata update in loop rotation, if loop does have a very
> >>>> polarized distribution. i.e. cases where a few iterations takes all
> >>>> the trip counts and rest have close to zero. We basically
> >>>> underestimate the weight of the branch from guard block to exit block,
> >>>> as there should be more iterations of the loop that exits before even
> >>>> one iteration is executed. This will propagate into the CFG and result
> >>>> in us underestimating the average tripcount of the loops when it
> >>>> actually executes (i.e. we have fewer iterations responsible for all
> >>>> these body weight). This is OK, we become less aggressive in peeling,
> >>>> we left some performance opportunities on the table, but only if the
> >>>> loop distribution is polarized. And we may do a slightly worse job in
> >>>> block placement.
> >>>
> >>>
> >>> That's not exactly right. Underestimating can make us more aggressive,
> not
> >>> less.
> >>> Because we use a hard threshold for when to peel, underestimating the
> trip
> >>> count
> >>> can cause us to be more aggressive. I'm not sure this is a big problem,
> >>> because
> >>> if the threshold is chosen correctly, choosing to peel or not to peel
> when
> >>> we're very
> >>> close to the threshold should be a wash.
> >>> Admittedly, that's a pretty big if. :-) But I don't think it's a huge
> >>> problem.
> >>
> >> If the actual tripcount of the loop is 10 and we estimated to be 5, we
> >> will peel it 5 iterations.
> >> Its only when the
> >> https://github.com/llvm-mirror/llvm/blob/master/lib/Transforms/Utils/
> LoopUnrollPeel.cpp#L95
> >> is false,  we do not peel, then if the underestimation breaks this
> >> condition, the real number (bigger) will sure not cause the loop to be
> >> peeled.
> >>
> >> i.e. we will never peel more than this threshold. so I feel we are
> >> leaving opportunities on the table, instead of being too
> >> aggressive.
> >>
> >> -Xin
> >
> > Ok, there is also the possiblity that we underestimate and peel a loop
> > which we would not peel given the right tripcount.  I think thats what
> > you meant.
> > -Xin
> >>>
> >>> Dehao, why do you think this is dangerous? It's certainly somewhat
> ugly,
> >>> but that's a separate issue.
> >>>
> >>>>
> >>>> 3. In case the loop distribution is close to each other, the patch
> >>>> handles it and distributes the weight in a reasonable way I feel.
> >>>>
> >>>> -Xin
> >>>> >
> >>>> > On Thu, Jan 12, 2017 at 2:33 PM, Xin Tong <trent.xin.tong at gmail.com
> >
> >>>> > wrote:
> >>>> >>
> >>>> >> On Thu, Jan 12, 2017 at 2:19 PM, Dehao Chen <danielcdh at gmail.com>
> >>>> >> wrote:
> >>>> >> >
> >>>> >> >
> >>>> >> > On Thu, Jan 12, 2017 at 2:14 PM, Xin Tong <
> trent.xin.tong at gmail.com>
> >>>> >> > wrote:
> >>>> >> >>
> >>>> >> >> On Thu, Jan 12, 2017 at 2:10 PM, Dehao Chen <
> danielcdh at gmail.com>
> >>>> >> >> wrote:
> >>>> >> >> >
> >>>> >> >> >
> >>>> >> >> > On Thu, Jan 12, 2017 at 2:02 PM, Xin Tong
> >>>> >> >> > <trent.xin.tong at gmail.com>
> >>>> >> >> > wrote:
> >>>> >> >> >>
> >>>> >> >> >> On Thu, Jan 12, 2017 at 1:34 PM, Dehao Chen <
> danielcdh at gmail.com>
> >>>> >> >> >> wrote:
> >>>> >> >> >> > If we assume that the trip count will always be the same,
> then
> >>>> >> >> >> > if
> >>>> >> >> >> > back
> >>>> >> >> >> > edge
> >>>> >> >> >> > taken probability falls in the range of (0, 0.5), then it
> >>>> >> >> >> > already
> >>>> >> >> >> > broke
> >>>> >> >> >> > this
> >>>> >> >> >> > assumption. So we do not have enough info to infer both
> branch
> >>>> >> >> >> > probabilities. So probably we should simply set the inner
> >>>> >> >> >> > probability
> >>>> >> >> >> > as
> >>>> >> >> >> > "0". And when we estimate loop trip count, we will always
> >>>> >> >> >> > return 0
> >>>> >> >> >> > if
> >>>> >> >> >> > the
> >>>> >> >> >> > latch probability is 0. Something like:
> >>>> >> >> >> >
> >>>> >> >> >> > rotate_update_probability(float orig_prob) {
> >>>> >> >> >> >   branch_prob_outside = orig_prob;
> >>>> >> >> >> >   if (orig_prob >= 0 && orig_prob < 0.5) {
> >>>> >> >> >> >     branch_prob_inside = 0;
> >>>> >> >> >> >   } else {
> >>>> >> >> >> >     branch_prob_inside = 2 - 1/orig_prob;
> >>>> >> >> >> >   }
> >>>> >> >> >> > }
> >>>> >> >> >> >
> >>>> >> >> >> > int estimate_trip_count(float latch_probability) {
> >>>> >> >> >> >    if (latch_probability == 0)
> >>>> >> >> >> >      return 0;
> >>>> >> >> >> >    else
> >>>> >> >> >> >      return 1/(1-latch_probability)
> >>>> >> >> >> > }
> >>>> >> >> >>
> >>>> >> >> >> We assume the trip count will be close to each other, they
> do not
> >>>> >> >> >> have
> >>>> >> >> >> to be identical =). And that assumption has implications, I
> >>>> >> >> >> addressed
> >>>> >> >> >> the implications when we have polarized loop tripcounts in an
> >>>> >> >> >> earlier
> >>>> >> >> >> response. I feel we need to make assumptions and also reason
> >>>> >> >> >> about
> >>>> >> >> >> the
> >>>> >> >> >> fallouts if that assumption turns out to be not true as
> Michael
> >>>> >> >> >> said.
> >>>> >> >> >
> >>>> >> >> >
> >>>> >> >> > Yes, the above code is trying to address one case where the
> >>>> >> >> > assumption
> >>>> >> >> > is
> >>>> >> >> > untrue.
> >>>> >> >> >
> >>>> >> >> >>
> >>>> >> >> >>
> >>>> >> >> >> Back to your example, I am not sure i understand it fully, is
> >>>> >> >> >> orig_prob the probability of branching outside before
> rotation ?
> >>>> >> >> >
> >>>> >> >> >
> >>>> >> >> > Yes, orig_prob is the original probability of the branch
> before
> >>>> >> >> > loop
> >>>> >> >> > rotation.
> >>>> >> >>
> >>>> >> >> If orig_prob == 0.5, the branch_prob_inside is 0.
> >>>> >> >> (branch_prob_inside
> >>>> >> >> is the probability that the loop body is branched to before
> rotation
> >>>> >> >> ?).
> >>>> >> >
> >>>> >> >
> >>>> >> > Sorry about the confusion. Let me use my original example:
> >>>> >> >
> >>>> >> > Before loop rotation
> >>>> >> >
> >>>> >> > L1:
> >>>> >> > if (cond) { // "orig_prob"
> >>>> >> >   stmt;
> >>>> >> >   goto L1;
> >>>> >> > }
> >>>> >> >
> >>>> >> > After loop rotation:
> >>>> >> >
> >>>> >> > if (cond) { // "branch_prob_outside"
> >>>> >> > L1:
> >>>> >> >   stmt;
> >>>> >> >   if (cond) { // "branch_prob_inside" // and also
> "latch_probability"
> >>>> >> >     goto L1;
> >>>> >> >   }
> >>>> >> > }
> >>>> >> >
> >>>> >> > Note that all probability are for when "cond" is true.
> >>>> >> >
> >>>> >> > Thanks,
> >>>> >> > Dehao
> >>>> >>
> >>>> >> Thanks for the code. One thing I notice is that branch_prob_outside
> >>>> >> probably should not be orig_prob. orig_prob before rotation is the
> (#
> >>>> >> of times the exit is taken)/(# of times the header branch is hit
> >>>> >> through forward and back edges). So if the loop has a cmp i32
> %index,
> >>>> >> 127 and recurrence of +1 (index = index + 1), i.e. the loop has a
> trip
> >>>> >> count of 128, then branch_prob_outside = orig_prob = 1/128.
> However,
> >>>> >> branch_prob_outside should really be 0 as the loop will always
> >>>> >> execute, i..e branch_prob_outside in the guard block will never be
> >>>> >> taken.
> >>>> >> >
> >>>> >> >
> >>>> >> >>
> >>>> >> >> Thanks,
> >>>> >> >> -Xin
> >>>> >> >> >
> >>>> >> >> >>
> >>>> >> >> >>
> >>>> >> >> >> Thanks,
> >>>> >> >> >> -Xin
> >>>> >> >> >>
> >>>> >> >> >> >
> >>>> >> >> >> >
> >>>> >> >> >> > On Thu, Jan 12, 2017 at 12:59 PM, Michael Kuperstein via
> >>>> >> >> >> > Phabricator
> >>>> >> >> >> > <reviews at reviews.llvm.org> wrote:
> >>>> >> >> >> >>
> >>>> >> >> >> >> mkuper added inline comments.
> >>>> >> >> >> >>
> >>>> >> >> >> >>
> >>>> >> >> >> >> ================
> >>>> >> >> >> >> Comment at: lib/Transforms/Scalar/LoopRotation.cpp:476
> >>>> >> >> >> >> +  // data.
> >>>> >> >> >> >> +  if (!updateLoopEstimatedBranchWeight(L, GBI, BI /*
> >>>> >> >> >> >> OrigHeader
> >>>> >> >> >> >> BR
> >>>> >> >> >> >> */)) {
> >>>> >> >> >> >> +    BI->setMetadata(LLVMContext::MD_prof, nullptr);
> >>>> >> >> >> >> ----------------
> >>>> >> >> >> >> Also, regardless of the rest of the discussion - I don't
> think
> >>>> >> >> >> >> we
> >>>> >> >> >> >> should
> >>>> >> >> >> >> drop the metadata on the floor if we fail.
> >>>> >> >> >> >>
> >>>> >> >> >> >> I don't think "No data is better than imprecise data" is
> right
> >>>> >> >> >> >> in
> >>>> >> >> >> >> the
> >>>> >> >> >> >> general case, but that's arguable. Specifically here,
> though,
> >>>> >> >> >> >> we're
> >>>> >> >> >> >> imprecise even if updateLoopEstimatedBranchWeight()
> succeeds,
> >>>> >> >> >> >> because
> >>>> >> >> >> >> of the
> >>>> >> >> >> >> assumptions we make on the distribution.
> >>>> >> >> >> >>
> >>>> >> >> >> >>
> >>>> >> >> >> >> https://reviews.llvm.org/D28593
> >>>> >> >> >> >>
> >>>> >> >> >> >>
> >>>> >> >> >> >>
> >>>> >> >> >> >
> >>>> >> >> >
> >>>> >> >> >
> >>>> >> >
> >>>> >> >
> >>>> >
> >>>> >
> >>>
> >>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20170112/3904e620/attachment.html>