[PATCH] D14829: [SLP] Vectorize gather-like idioms ending at non-consecutive loads.
Matthew Simpson via llvm-commits
llvm-commits at lists.llvm.org
Wed Dec 2 13:22:46 PST 2015
mssimpso added a comment.
Nadav/Hal,
Below are the statistically significant compile-time differences observed for the test suite, spec2000, and spec2006 (there is one). Results were computed from 10 samples each, median aggregation, 95% confidence intervals, and a 0.05 statistical significance level for the Mann–Whitney U test.
Program Base Change %
---------------------------------------------------------------------------------------
MultiSource/Benchmarks/MiBench/security-rijndael/security-rijndael 0.73 0.89 18.88
No statistically significant performance differences were observed for spec2000 and spec2006 on a Cortex-A57-like cpu (but see the explanation below). Our infrastructure is currently unable to produce run-time data for the test suite. However, a binary diff shows that the following benchmarks were modified by the change.
Program
---------------------------------------------------------------------------------------
MultiSource/Applications/JM/lencod
MultiSource/Applications/minisat
MultiSource/Benchmarks/Bullet
MultiSource/Benchmarks/MallocBench/espresso
spec2006/h264ref
spec2006/povray
Regarding the lack of performance differences observed in spec2000 and spec2006, the current patch is somewhat limited, and I was planning a follow-on to address the issue. The indices of the GEPs that seed the expressions are forced to be i64. However, the expressions may not require that much precision, so we end up with unneeded extensions and/or narrower vectors than is optimal, and the cost model often prevents us from vectorizing. Please see the test cases in the patch for an example. We are not yet able to vectorize the second case because of this issue.
The follow-on would essentially be to incorporate James's type-shrinking work from the loop vectorizer in order to rewrite the expressions in the narrower type if profitable. Work-in-progress has shown that type-shrinking with this patch can provide a significant performance improvement for spec2006/h264ref, at least.
Please let me know what you think of this plan and whether the optimization is better suited for SLP or SelectionDAG. Thanks again!
http://reviews.llvm.org/D14829
More information about the llvm-commits
mailing list