<html>
<head>
<meta http-equiv="content-type" content="text/html; charset=utf-8">
</head>
<body text="#000000" bgcolor="#FFFFFF">
<p>This email is mostly to record an idea. If I get lucky enough to
nerd snipe someone else before I have a chance to get back to
this, even better. :)</p>
<p>I was taking a look at Chandler's new loop unswitch pass. One of
the really key differences in the new pass is that it models the
approximate cost of the actual unswitch. At the moment, the
current cost model just considers whether particular blocks exist
in only one of the two resulting loops, but since the old
unswitcher just assumed every instruction had to be duplicated,
that's a huge improvement right there. Thinking about this while
trying to understand the likely implications for idiomatic code,
led me to the following ideas. <br>
</p>
<p><b>First Iteration Cost </b>- We end up with a lot of loops
which contain range checks before loop invariant conditions.
Almost always, we can prove that the range check isn't going to
fail on the first iteration. As a result, we can ignore the early
exit (or conditionally reached block) cost in one copy of the loop
as long as the branch in question dominates the latch. The main
value of this is eliminating a pass order dependency; we'd
otherwise have to handle the range check (via indvarsimply, irce,
or loop-predication) before rerunning unswitch. Example:</p>
<p>// example is C<br>
for (int i = 0; i < len; i++) {<br>
if (i < 2^20) throw_A();<br>
if (a == null) throw_B(); // We can unswitch this at low cost<br>
sum += a[i];<br>
}</p>
<p>This could be fairly simply implemented as a special case of
trivial unswitch where we advance through branches which are known
to be constant on the first iteration. But that interacts badly
with the second idea and gives up generality. <br>
</p>
<p><b>Data Flow Cost </b>- We could model the cost of producing the
loop exit values. Since we have LCSSA, we know we can cheaply
identify values used in the exit. For any value which is
otherwise unused in the loop, we end up producing only one copy of
the computation which produces the value. If we computed a
per-block "cost to produce values used in this block", we could
model this in the cost model. The interesting bit is that this
seems to subsume the current definition of trivial unswitch since
the cost for a trivial unswitch as considered by the full
unswitcher becomes zero. <br>
</p>
<p>// example, in C, using predicated loads/stores<br>
for (int i = 0; i < len; i++) {<br>
int v = 0;<br>
if (b[i]) { v = *p }<br>
if (C) {<br>
break;<br>
}<br>
if (b[i]) { *p = v + i }<br>
}</p>
<p>Philip<br>
</p>
<p><br>
</p>
<p><br>
</p>
<p><br>
</p>
<p><br>
</p>
</body>
</html>