[PATCH] D25963: [LoopUnroll] Implement profile-based loop peeling

Tue Oct 25 20:51:11 PDT 2016

mkuper added a comment.

In https://reviews.llvm.org/D25963#579328, @hfinkel wrote:

> As a high-level comment, it would be nice to also have loop metadata to specify a typical trip count (or trip counts).
>
> Intel, for example has (https://software.intel.com/en-us/node/524502):
>
>   #pragma loop_count(n)
>   
>
> which asks the optimizer to optimize for a trip count of n. Moreover, and perhaps more importantly, is also supports:
>
>   #pragma loop_count(n1, n2, ...)
>   
>
> which asks for specializations for trip counts n1, n2, etc.
>
> Also supported by Intel's compiler is:
>
>   #pragma loop_count min(n),max(n),avg(n)
>   

I agree this would be nice, but I think it's somewhat orthogonal.
We can start with an implementation of "estimated trip count" that relies on branch weights, and refine to use more specialized metadata if/when we have it.

> FWIW, obviously part of the problem with the average is that you might miss the common trip counts. A loop that is generally executed with a trip count of 3 or 5, might end up with a average near 4; I'm not sure what the best thing would be to do in that case.

Right, but at least for sampling-based PGO, I think average is the best we're going to get. (Instrumentation can probably do better, and user hints certainly can).
I'm not entirely sure this is a problem, though. We want to optimize for the common case, and I think the average gives us that - in the "0.5 * 3 + 0.5 * 5" case, if we peel off 4 iterations, then 90% of the dynamically executed iterations will hit the peeled-off section - all iterations of the "3 trips" case, and 4 out of 5 iterations of the "5 trips" cases. Which is hopefully better than leaving the loop as is.

https://reviews.llvm.org/D25963