[PATCH] D24118: [JumpThreading] Only write back branch-weight MDs for blocks that originally had PGO info

Fri Sep 2 09:31:02 PDT 2016

On Fri, Sep 2, 2016 at 6:32 AM, Duncan Exon Smith <dexonsmith at apple.com> wrote:
>
> On Sep 1, 2016, at 18:23, Adam Nemet <anemet at apple.com> wrote:
>
> anemet added a comment.
>
> In https://reviews.llvm.org/D24118#532156, @davidxl wrote:
>
> There is a fundamental problem in BFI that it can not handle 0 weight -- to
> workaround it
>
>
> 1. FE PGO annotator will always add 1 to weights of both targets
> unconditionally when annotating the branch
>
> 2. BFI always add 1 to the weight if it is zero.
>
>
>  The end result is that
>
> 3. we will never see code region annoated with zero frequency/count
>
>
>
> Ah, that totally explains what's happening here.  Is there a PR?
>
> 2. for FE PGO, all loop trip count appears to be half of the real trip
> count.
>
>
>
> Wow, this was going to be next thing for me to investigate, thanks for the
> insights!
>
>
> This is not true though.  For FE PGO, the loop trip count is only halved for
> loops that are covered once.

Ok a more precise statement is that for less frequently executed loops
(including very hot high trip count ones), the trip count from profile
will look much smaller.

>
> If a loop has good coverage, the trip count is barely adjusted.  The
> adjustment has to do with confidence.

Some hot loops may not be entered frequently.  We have debated this
before: people may like to do pruning of training data so that the
training runtime overhead is low -- those users will be affected the
most with this heuristic.

Another strange behavior is that if you merge two identical profile
into one, and all of sudden compiler will see loop trip counts changed
(can be large change).

>
> I'm not sure "adding 1" is the best heuristic, but I'm fairly convinced that
> "adding 0" is worse.  I believe Jakob linked to a source for his choice of
> +1 if you want to read up on it (look in the comments in CFE).

I will probably collect some data and do some analysis at some point.
Without data, we can not prove anything.  I can easily add support for
loop trip count value profiling and see how in general loop trip
counts are distributed.

thanks,

David

>
> Separately from adjusting heuristics, perhaps loops with statically known
> trip counts should be given special treatment.  Why rely on statistical
> heuristics if SCEV knows a loop counts to 10?  IIRC, loops with no coverage
> are using a fixed heuristic (see BPI) for loop trip counts, something like
> "10x".  We could have a pass that rewrote branch weights based on SCEV
> results instead (or use SCEV in BPI).
>
>
> https://reviews.llvm.org/D24118
>
>
>