[LLVMdev] Loop unrolling opportunity in SPEC's libquantum with profile info
atrick at apple.com
Thu Jan 16 20:47:23 PST 2014
On Jan 15, 2014, at 4:13 PM, Diego Novillo <dnovillo at google.com> wrote:
> Chandler also pointed me at the vectorizer, which has its own
> unroller. However, the vectorizer only unrolls enough to serve the
> target, it's not as general as the runtime-triggered unroller. From
> what I've seen, it will get a maximum unroll factor of 2 on x86 (4 on
> avx targets). Additionally, the vectorizer only unrolls to aid
> reduction variables. When I forced the vectorizer to unroll these
> loops, the performance effects were nil.
Vectorization and partial unrolling (aka runtime unrolling) for ILP should to be the same pass. The profitability analysis required in each case is very closely related, and you never want to do one before or after the other. The analysis makes sense even for targets without vector units. The “vector unroller” has an extra restriction (unlike the LoopUnroll pass) in that it must be able to interleave operations across iterations. This is usually a good thing to check before unrolling, but the compiler’s dependence analysis may be too conservative in some cases.
Currently, the cost model is conservative w.r.t unrolling because we don't want to increase code size. But minimally, we should unroll until we can saturate the resources/ports. e.g. a loop with a single load should be unrolled x2 so we can do two loads per cycle. If you can come up with improved heuristics without generally impacting code size that’s great.
Where we're currently looking for a constant trip count to avoid excessive unrolling, we could now look at profiled trip count if it isn't constant.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the llvm-dev