[LLVMbugs] [Bug 5540] Partial loop unwinding with variable iteration count

bugzilla-daemon at cs.uiuc.edu bugzilla-daemon at cs.uiuc.edu
Sun Jan 3 17:53:46 PST 2010


bearophile <bearophile at mailas.com> changed:

           What    |Removed                     |Added
             Status|RESOLVED                    |REOPENED
         Resolution|DUPLICATE                   |

--- Comment #5 from bearophile <bearophile at mailas.com>  2010-01-03 19:53:46 ---
Thank you for your explanation, that shows me how to act :-)

This optimization is present and often done by the Java HotSpot, so Sun
engineers think it's not a waste of their time. And my experiments show they
are right.

In the last C attach I've shown a better example, it's a stripped down version
of the SciMark2 benchmark in C language that performs the LU benchmark only, it
prints the MFlops. It contains a part that performs conditional compilation, if
you define DO_UNROLL it performs the optimization quite similar to the one done
by HotSpot, otherwise uses the original SciMark2 code, so you can compare the
performance improvement. Timings are more info are at the top of the code.

Note that in this code I assume the loop count is NOT known at compile-time.

If you want I can show you another code example where this optimization is

Note that generally the more code there's in the loop, the less number of times
it's useful to perform such partial unwinding. Here HotSpot unwinds 8 times
(but I've seen that about 10 is optimal) because in the inner loop there's just
one line:
Aii[jj] -= AiiJ * Aj[jj];
But in another example, where there's more stuff inside the loop, HotSpot
unwinds 4 times only, because unwinding more probably puts the CPU code cache
under too much pressure, reducing performance. Life is made of compromises.

I think it's not too much hard to implement this feature, and I think it can be
useful if applied wisely. But there's a problem: I think a static compiler
generally doesn't know what loops to partially unroll. HotSpot knows it because
the Java code is usually under profiling, while C/C++/D code compiled by LLVM
is not (unless LLVM profile-guided optimization is used). So this may require
user annotations, or profile-guided optimization, or smart compiler heuristics
to understand where and what partially unroll (how much unroll is probably not
hard to determine, just counting how many instructions are present inside the
loop). Both the annotation route and the profile-guided optimization route seem
doable and not too much hard. The heuristics route looks harder to me.

Configure bugmail: http://llvm.org/bugs/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.

More information about the llvm-bugs mailing list