[LLVMdev] Separating loop nests based on profile information?
Philip Reames
listmail at philipreames.com
Wed Jan 7 17:19:04 PST 2015
I've been playing with approaches to getting better optimization of
loops which contain infrequently executed slow paths. I've gotten as
far as throwing together a proof of concept implementation of a profile
guided optimization to separate a single loop with multiple latches into
a loop nest, but I want to get feedback from interested parties before
investing much more effort.
The primary case I'm concerned about looks something like this C example:
while( some_condition )
//do lots of work here
if (!some_rare_condition) { <-- This is loop variant
helper();
}
}
The key point here is that the 'do lots of work' section contains things
we could lift out of the loop, but our knowledge of memory gets
completely destoryed by the 'helper' call which doesn't get inlined.
This ends up crippling LICM in particular, and GVN to a lessor degree.
The approach I've been playing with would create this loop structure:
goto inner
while (true) {
outer:
helper();
inner:
while( some_condition )
//do lots of work here
if (!some_rare_condition) { <-- This is loop variant
goto outer;
}
}
}
(Excuse the psuedo-C. Hopefully the idea is clear.)
You can see my stalking horse implementation here:
http://reviews.llvm.org/D6872
This is not an actual patch per se; it's just intended to make the
discussion more concrete and thus hopefully easier to follow.
The basic idea of this patch is to use profiling information about the
frequency of a backedges to separate a loop with multiple latches into a
loop nest with the rare backedge being the outer loop. We already use a
set of static heuristics based on non-varying arguments to PHI nodes to
do the same type of thing.
The motivation is that doing this pulls rarely executed code out of the
innermost loop and tends to greatly simplify analysis and optimization
of that loop. In particular, it can enable substantial LICM gains when
there are clobbering calls down rare paths. The original motivating case
for this was the slow path of a safepoint poll, but I believe the
possible benefit is more general as well.
Points for discussion:
* Is using profile information for this purpose even a reasonable
thing to do?
* I chose to implement this without relying on the existing block
frequency analysis. My reasoning was that a) this is a rarely taken
case and adding an expensive analysis dependency probably wasn't
worthwhile and b) that block frequency analysis was more
expensive/precise than I really needed. Is this reasonable?
* If so, is the notion of 'rareness' of a loop block something that's
worth extracting out on it's own and reusing? Are there other
similar uses anyone can think of?
* Currently, I'm only supporting a fairly small set of controlling
conditions. Are there important cases I'm not considering?
* Since the rarest latch is often deep in a loop - with other "if (X)
continue;" (i.e. latches) before it - this tends to create loops
with multiple exiting blocks. Some of the existing passes might not
deal with this well, is that a major concern? Suggestions for how to
analysis and validate?
* Currently, I've structured this as pulling off the rarest latch as
an outer iteration. I could also pull off the most popular latch as
an inner iteration. This might give different tradeoffs; thoughts?
Generally, any thoughts anyone have on the problem or approach are
welcome. I'm not particular attached to the particular approach laid out
here and if there's a more advantageous approach, all the better.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150107/83c3990d/attachment.html>
More information about the llvm-dev
mailing list