[PATCH] Adding the loop vectorizer to the LTO pipeline

Fri Feb 21 15:07:57 PST 2014

----- Original Message -----
> From: "Arnold Schwaighofer" <aschwaighofer at apple.com>
> To: "llvm-commits" <llvm-commits at cs.uiuc.edu>
> Cc: "Nadav Rotem" <nrotem at apple.com>, "Hal J. Finkel" <hfinkel at anl.gov>, "Chandler Carruth" <chandlerc at gmail.com>,
> "renato golin" <renato.golin at linaro.org>
> Sent: Friday, February 21, 2014 3:50:08 PM
> Subject: [PATCH] Adding the loop vectorizer to the LTO pipeline
> 
> 
> 
> Hi all,
> 
> I would like to add the loop vectorizer to the LTO pipeline. 

Yes, please do.

 -Hal

> During
> LTO more loops become countable (LICM moves the loop bound load out
> of the loop with the help of GlobalModRef) as a consequence we can
> vectorize more loops. Currently, we are not making use of this
> opportunity.
> 
> In my measurements on arm and x86 I saw no serious regressions and
> some bumps in performance. The one benchmark that benefits from this
> significantly is twolf/ref dataset (~5% on an arm architecture and
> ~5% on x86-64 sandy bridge) and some internal benchmarks.
> 
> Link time, as measured by timing the clang link step, increases by
> about 1% on average over the test-suite + externals.
> 
> The results on x86-64 -mavx -O3 -flto:
> 
> 
> Performance Regressions - Execution Time Δ Previous Current σ Δ (B) σ
> (B)
> MultiSource/Benchmarks/mafft/pairlocalalign 2.86% 22.2553 22.8911
> 0.0250 0.00% 0.0250
> 
> 
> Performance Improvements - Execution Time Δ Previous Current σ Δ (B)
> σ (B)
> SingleSource/Benchmarks/Misc/matmul_f64_4x4 -28.95% 0.2660 0.1890
> 0.0002 0.00% 0.0002
> MultiSource/Benchmarks/TSVC/Symbolics-flt/Symbolics-flt -16.08%
> 1.9063 1.5998 0.0020 0.00% 0.0020
> MultiSource/Benchmarks/TSVC/ControlLoops-flt/ControlLoops-flt -7.77%
> 2.5324 2.3356 0.0000 0.00% 0.0000
> MultiSource/Benchmarks/TSVC/ControlLoops-dbl/ControlLoops-dbl -6.07%
> 3.2678 3.0695 0.0005 0.00% 0.0005
> MultiSource/Benchmarks/TSVC/Reductions-dbl/Reductions-dbl -5.68%
> 3.5590 3.3570 0.0011 0.00% 0.0011
> External/SPEC/CINT2000/300_twolf/300_twolf -4.64% 3.5099 3.3471
> 0.0081 0.00% 0.0081
> SingleSource/Benchmarks/Misc/mandel -4.30% 0.4304 0.4119 0.0001 0.00%
> 0.0001
> MultiSource/Benchmarks/TSVC/Reductions-flt/Reductions-flt -2.93%
> 6.8906 6.6885 0.0007 0.00% 0.0007
> 
> 
> 
> 
> 
> 
> 

-- 
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory