[LLVMdev] Separating loop nests based on profile information?

Mon Jan 12 19:27:42 PST 2015

On Wed, Jan 7, 2015 at 5:19 PM, Philip Reames <listmail at philipreames.com>
wrote:

>  I've been playing with approaches to getting better optimization of loops
> which contain infrequently executed slow paths.  I've gotten as far as
> throwing together a proof of concept implementation of a profile guided
> optimization to separate a single loop with multiple latches into a loop
> nest, but I want to get feedback from interested parties before investing
> much more effort.
>
> The primary case I'm concerned about looks something like this C example:
> while( some_condition )
>   //do lots of work here
>   if (!some_rare_condition) { <-- This is loop variant
>     helper();
>   }
> }
>
> The key point here is that the 'do lots of work' section contains things
> we could lift out of the loop, but our knowledge of memory gets completely
> destoryed by the 'helper' call which doesn't get inlined.  This ends up
> crippling LICM in particular, and GVN to a lessor degree.
>
> The approach I've been playing with would create this loop structure:
> goto inner
> while (true) {
>   outer:
>   helper();
>   inner:
>   while( some_condition )
>     //do lots of work here
>     if (!some_rare_condition) { <-- This is loop variant
>       goto outer;
>     }
>   }
> }
>
> (Excuse the psuedo-C.  Hopefully the idea is clear.)
>
> You can see my stalking horse implementation here:
> http://reviews.llvm.org/D6872
>
> This is not an actual patch per se; it's just intended to make the
> discussion more concrete and thus hopefully easier to follow.
>
> The basic idea of this patch is to use profiling information about the
> frequency of a backedges to separate a loop with multiple latches into a
> loop nest with the rare backedge being the outer loop. We already use a set
> of static heuristics based on non-varying arguments to PHI nodes to do the
> same type of thing.
>
> The motivation is that doing this pulls rarely executed code out of the
> innermost loop and tends to greatly simplify analysis and optimization of
> that loop.
>

This is good thing to do from the code/block placement point of view -- to
improve icache utilization. Code layout is one of the most important passes
that can benefit from profile data.

> In particular, it can enable substantial LICM gains when there are
> clobbering calls down rare paths.
>

How can this enable more LICM?

> The original motivating case for this was the slow path of a safepoint
> poll, but I believe the possible benefit is more general as well.
>
> Points for discussion:
>
>    - Is using profile information for this purpose even a reasonable
>    thing to do?
>
>
yes.

>
>    - I chose to implement this without relying on the existing block
>    frequency analysis. My reasoning was that a) this is a rarely taken case
>    and adding an expensive analysis dependency probably wasn't worthwhile and
>    b) that block frequency analysis was more expensive/precise than I really
>    needed. Is this reasonable?
>
>
IMO no. Remember your pass should also work with real profile data
(instrumentation or sample based) -- you should  rely on existing profile
infrastructure to provide what you need. (Ideally you should treat it as
something that is always available for query).

>
>    - If so, is the notion of 'rareness' of a loop block something that's
>    worth extracting out on it's own and reusing? Are there other similar uses
>    anyone can think of?
>
>
Please do not introduce new notions for profile information -- there are
already many:)  You have block 'frequency' to represent intra-function
relative hotness. Currently missing, but if we need to represent
inter-procedural global hotness, something like bb execution 'count' may be
needed.

thanks,

David

>
>    - Currently, I'm only supporting a fairly small set of controlling
>    conditions. Are there important cases I'm not considering?
>    - Since the rarest latch is often deep in a loop - with other "if (X)
>    continue;" (i.e. latches) before it - this tends to create loops with
>    multiple exiting blocks. Some of the existing passes might not deal with
>    this well, is that a major concern? Suggestions for how to analysis and
>    validate?
>    - Currently, I've structured this as pulling off the rarest latch as
>    an outer iteration. I could also pull off the most popular latch as an
>    inner iteration. This might give different tradeoffs; thoughts?
>
> Generally, any thoughts anyone have on the problem or approach are
> welcome. I'm not particular attached to the particular approach laid out
> here and if there's a more advantageous approach, all the better.
>
>
>
>
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150112/295c2791/attachment.html>