[PATCH] D14829: [SLP] Vectorize gather-like idioms ending at non-consecutive loads.

Wed Dec 2 13:22:46 PST 2015

mssimpso added a comment.

Nadav/Hal,

Below are the statistically significant compile-time differences observed for the test suite, spec2000, and spec2006 (there is one). Results were computed from 10 samples each, median aggregation, 95% confidence intervals, and a 0.05 statistical significance level for the Mann–Whitney U test.

  Program                                                             Base  Change      %
  ---------------------------------------------------------------------------------------
  MultiSource/Benchmarks/MiBench/security-rijndael/security-rijndael  0.73    0.89  18.88

No statistically significant performance differences were observed for spec2000 and spec2006 on a Cortex-A57-like cpu (but see the explanation below). Our infrastructure is currently unable to produce run-time data for the test suite. However, a binary diff shows that the following benchmarks were modified by the change.

  Program
  ---------------------------------------------------------------------------------------
  MultiSource/Applications/JM/lencod
  MultiSource/Applications/minisat
  MultiSource/Benchmarks/Bullet
  MultiSource/Benchmarks/MallocBench/espresso
  spec2006/h264ref
  spec2006/povray

Regarding the lack of performance differences observed in spec2000 and spec2006, the current patch is somewhat limited, and I was planning a follow-on to address the issue. The indices of the GEPs that seed the expressions are forced to be i64. However, the expressions may not require that much precision, so we end up with unneeded extensions and/or narrower vectors than is optimal, and the cost model often prevents us from vectorizing. Please see the test cases in the patch for an example. We are not yet able to vectorize the second case because of this issue.

The follow-on would essentially be to incorporate James's type-shrinking work from the loop vectorizer in order to rewrite the expressions in the narrower type if profitable. Work-in-progress has shown that type-shrinking with this patch can provide a significant performance improvement for spec2006/h264ref, at least.

Please let me know what you think of this plan and whether the optimization is better suited for SLP or SelectionDAG. Thanks again!

http://reviews.llvm.org/D14829