[LLVMdev] Separating loop nests based on profile information?
Xinliang David Li
xinliangli at gmail.com
Mon Jan 12 19:27:42 PST 2015
On Wed, Jan 7, 2015 at 5:19 PM, Philip Reames <listmail at philipreames.com>
wrote:
> I've been playing with approaches to getting better optimization of loops
> which contain infrequently executed slow paths. I've gotten as far as
> throwing together a proof of concept implementation of a profile guided
> optimization to separate a single loop with multiple latches into a loop
> nest, but I want to get feedback from interested parties before investing
> much more effort.
>
> The primary case I'm concerned about looks something like this C example:
> while( some_condition )
> //do lots of work here
> if (!some_rare_condition) { <-- This is loop variant
> helper();
> }
> }
>
> The key point here is that the 'do lots of work' section contains things
> we could lift out of the loop, but our knowledge of memory gets completely
> destoryed by the 'helper' call which doesn't get inlined. This ends up
> crippling LICM in particular, and GVN to a lessor degree.
>
> The approach I've been playing with would create this loop structure:
> goto inner
> while (true) {
> outer:
> helper();
> inner:
> while( some_condition )
> //do lots of work here
> if (!some_rare_condition) { <-- This is loop variant
> goto outer;
> }
> }
> }
>
> (Excuse the psuedo-C. Hopefully the idea is clear.)
>
> You can see my stalking horse implementation here:
> http://reviews.llvm.org/D6872
>
> This is not an actual patch per se; it's just intended to make the
> discussion more concrete and thus hopefully easier to follow.
>
> The basic idea of this patch is to use profiling information about the
> frequency of a backedges to separate a loop with multiple latches into a
> loop nest with the rare backedge being the outer loop. We already use a set
> of static heuristics based on non-varying arguments to PHI nodes to do the
> same type of thing.
>
> The motivation is that doing this pulls rarely executed code out of the
> innermost loop and tends to greatly simplify analysis and optimization of
> that loop.
>
This is good thing to do from the code/block placement point of view -- to
improve icache utilization. Code layout is one of the most important passes
that can benefit from profile data.
> In particular, it can enable substantial LICM gains when there are
> clobbering calls down rare paths.
>
How can this enable more LICM?
> The original motivating case for this was the slow path of a safepoint
> poll, but I believe the possible benefit is more general as well.
>
> Points for discussion:
>
> - Is using profile information for this purpose even a reasonable
> thing to do?
>
>
yes.
>
> - I chose to implement this without relying on the existing block
> frequency analysis. My reasoning was that a) this is a rarely taken case
> and adding an expensive analysis dependency probably wasn't worthwhile and
> b) that block frequency analysis was more expensive/precise than I really
> needed. Is this reasonable?
>
>
IMO no. Remember your pass should also work with real profile data
(instrumentation or sample based) -- you should rely on existing profile
infrastructure to provide what you need. (Ideally you should treat it as
something that is always available for query).
>
> - If so, is the notion of 'rareness' of a loop block something that's
> worth extracting out on it's own and reusing? Are there other similar uses
> anyone can think of?
>
>
Please do not introduce new notions for profile information -- there are
already many:) You have block 'frequency' to represent intra-function
relative hotness. Currently missing, but if we need to represent
inter-procedural global hotness, something like bb execution 'count' may be
needed.
thanks,
David
>
> - Currently, I'm only supporting a fairly small set of controlling
> conditions. Are there important cases I'm not considering?
> - Since the rarest latch is often deep in a loop - with other "if (X)
> continue;" (i.e. latches) before it - this tends to create loops with
> multiple exiting blocks. Some of the existing passes might not deal with
> this well, is that a major concern? Suggestions for how to analysis and
> validate?
> - Currently, I've structured this as pulling off the rarest latch as
> an outer iteration. I could also pull off the most popular latch as an
> inner iteration. This might give different tradeoffs; thoughts?
>
> Generally, any thoughts anyone have on the problem or approach are
> welcome. I'm not particular attached to the particular approach laid out
> here and if there's a more advantageous approach, all the better.
>
>
>
>
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150112/295c2791/attachment.html>
More information about the llvm-dev
mailing list