<html>
<head>
<meta http-equiv="content-type" content="text/html; charset=utf-8">
</head>
<body bgcolor="#FFFFFF" text="#000000">
I've been playing with approaches to getting better optimization of
loops which contain infrequently executed slow paths. I've gotten
as far as throwing together a proof of concept implementation of a
profile guided optimization to separate a single loop with multiple
latches into a loop nest, but I want to get feedback from interested
parties before investing much more effort. <br>
<br>
The primary case I'm concerned about looks something like this C
example:<br>
while( some_condition )<br>
//do lots of work here<br>
if (!some_rare_condition) { <-- This is loop variant<br>
helper();<br>
}<br>
}<br>
<br>
The key point here is that the 'do lots of work' section contains
things we could lift out of the loop, but our knowledge of memory
gets completely destoryed by the 'helper' call which doesn't get
inlined. This ends up crippling LICM in particular, and GVN to a
lessor degree. <br>
<br>
The approach I've been playing with would create this loop
structure:<br>
goto inner<br>
while (true) {<br>
outer:<br>
helper();<br>
inner:<br>
while( some_condition )<br>
//do lots of work here<br>
if (!some_rare_condition) { <-- This is loop variant<br>
goto outer;<br>
}<br>
}<br>
}<br>
<br>
(Excuse the psuedo-C. Hopefully the idea is clear.)<br>
<br>
You can see my stalking horse implementation here:<br>
<a class="moz-txt-link-freetext" href="http://reviews.llvm.org/D6872">http://reviews.llvm.org/D6872</a><br>
<div class="phui-property-list-text-content">
<div class="phabricator-remarkup">
<p>This is not an actual patch per se; it's just intended to
make the discussion more concrete and thus hopefully easier to
follow. <br>
</p>
<p>The basic idea of this patch is to use profiling information
about the frequency of a backedges to separate a loop with
multiple latches into a loop nest with the rare backedge being
the outer loop. We already use a set of static heuristics
based on non-varying arguments to PHI nodes to do the same
type of thing.</p>
<p>The motivation is that doing this pulls rarely executed code
out of the innermost loop and tends to greatly simplify
analysis and optimization of that loop. In particular, it can
enable substantial LICM gains when there are clobbering calls
down rare paths. The original motivating case for this was the
slow path of a safepoint poll, but I believe the possible
benefit is more general as well.</p>
<p>Points for discussion:</p>
<ul class="remarkup-list">
<li class="remarkup-list-item">Is using profile information
for this purpose even a reasonable thing to do?</li>
<li class="remarkup-list-item">I chose to implement this
without relying on the existing block frequency analysis. My
reasoning was that a) this is a rarely taken case and adding
an expensive analysis dependency probably wasn't worthwhile
and b) that block frequency analysis was more
expensive/precise than I really needed. Is this reasonable?</li>
<li class="remarkup-list-item">If so, is the notion of
'rareness' of a loop block something that's worth extracting
out on it's own and reusing? Are there other similar uses
anyone can think of?</li>
<li class="remarkup-list-item">Currently, I'm only supporting
a fairly small set of controlling conditions. Are there
important cases I'm not considering?</li>
<li class="remarkup-list-item">Since the rarest latch is often
deep in a loop - with other "if (X) continue;" (i.e.
latches) before it - this tends to create loops with
multiple exiting blocks. Some of the existing passes might
not deal with this well, is that a major concern?
Suggestions for how to analysis and validate?</li>
<li class="remarkup-list-item">Currently, I've structured this
as pulling off the rarest latch as an outer iteration. I
could also pull off the most popular latch as an inner
iteration. This might give different tradeoffs; thoughts?</li>
</ul>
<p>Generally, any thoughts anyone have on the problem or
approach are welcome. I'm not particular attached to the
particular approach laid out here and if there's a more
advantageous approach, all the better.</p>
</div>
</div>
<br>
<br>
<br>
<br>
</body>
</html>