[LLVMdev] Loop unrolling opportunity in SPEC's libquantum with profile info

Wed Jan 15 20:50:55 PST 2014

----- Original Message -----
> From: "Sean Silva" <silvas at purdue.edu>
> To: "Diego Novillo" <dnovillo at google.com>
> Cc: nadav at apple.com, "LLVM Developers Mailing List" <llvmdev at cs.uiuc.edu>
> Sent: Wednesday, January 15, 2014 9:38:32 PM
> Subject: Re: [LLVMdev] Loop unrolling opportunity in SPEC's libquantum with profile info
> 
> 
> 
> 
> 
> 
> 
> 
> On Wed, Jan 15, 2014 at 7:13 PM, Diego Novillo < dnovillo at google.com
> > wrote:
> 
> 
> I am starting to use the sample profiler to analyze new performance
> opportunities. The loop unroller has popped up in several of the
> benchmarks I'm running. In particular, libquantum. There is a ~12%
> opportunity when the runtime unroller is triggered.
> 
> 
> 
> Pardon my ignorance, but what exactly does "runtime unroller" mean?
> In particular the "runtime" part of it. 

He's referring to the code in lib/Transforms/Utils/LoopUnrollRuntime.cpp -- which can be enabled by using the -unroll-runtime flag. The 'runtime' refers to the fact that the trip count is not known at compile time.

 -Hal

> Just from the name I'm
> imagining JIT'ing an unrolled version on the fly, or choosing an
> unrolled version at runtime, but neither of those interpretations
> seems likely.
> 
> 
> -- Sean Silva
> 
> 
> 
> This helps functions like quantum_sigma_x
> (
> http://sourcecodebrowser.com/libquantum/0.2.4/gates_8c_source.html#l00149
> ).
> The function accounts for ~20% of total runtime. By allowing the
> runtime unroller, we can speedup the program by about 12%.
> 
> I have been poking at the unroller a little bit. Currently, the
> runtime unroller is only triggered by a special flag or if the target
> states it in the unrolling preferences. We could also consult the
> block frequency information here. If the loop header has a higher
> relative frequency than the rest of the function, then we'd enable
> runtime unrolling.
> 
> Chandler also pointed me at the vectorizer, which has its own
> unroller. However, the vectorizer only unrolls enough to serve the
> target, it's not as general as the runtime-triggered unroller. From
> what I've seen, it will get a maximum unroll factor of 2 on x86 (4 on
> avx targets). Additionally, the vectorizer only unrolls to aid
> reduction variables. When I forced the vectorizer to unroll these
> loops, the performance effects were nil.
> 
> I'm currently looking at changing LoopUnroll::runOnLoop() to consult
> block frequency information for the loop header to decide whether to
> try runtime triggers for loops that don't have a constant trip count
> but could be partially peeled.
> 
> Does that sound reasonable?
> 
> 
> Thanks. Diego.
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
> 
> 
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
> 

-- 
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory