[LLVMdev] Loop unrolling opportunity in SPEC's libquantum with profile info
Diego Novillo
dnovillo at google.com
Wed Jan 15 16:13:27 PST 2014
I am starting to use the sample profiler to analyze new performance
opportunities. The loop unroller has popped up in several of the
benchmarks I'm running. In particular, libquantum. There is a ~12%
opportunity when the runtime unroller is triggered.
This helps functions like quantum_sigma_x
(http://sourcecodebrowser.com/libquantum/0.2.4/gates_8c_source.html#l00149).
The function accounts for ~20% of total runtime. By allowing the
runtime unroller, we can speedup the program by about 12%.
I have been poking at the unroller a little bit. Currently, the
runtime unroller is only triggered by a special flag or if the target
states it in the unrolling preferences. We could also consult the
block frequency information here. If the loop header has a higher
relative frequency than the rest of the function, then we'd enable
runtime unrolling.
Chandler also pointed me at the vectorizer, which has its own
unroller. However, the vectorizer only unrolls enough to serve the
target, it's not as general as the runtime-triggered unroller. From
what I've seen, it will get a maximum unroll factor of 2 on x86 (4 on
avx targets). Additionally, the vectorizer only unrolls to aid
reduction variables. When I forced the vectorizer to unroll these
loops, the performance effects were nil.
I'm currently looking at changing LoopUnroll::runOnLoop() to consult
block frequency information for the loop header to decide whether to
try runtime triggers for loops that don't have a constant trip count
but could be partially peeled.
Does that sound reasonable?
Thanks. Diego.
More information about the llvm-dev
mailing list