[PATCH] D14829: [SLP] Vectorize gather-like idioms ending at non-consecutive loads.

Thu Dec 10 12:32:25 PST 2015

hfinkel added a comment.

In http://reviews.llvm.org/D14829#300767, @mssimpso wrote:

> Nadav/Hal,
>
> Below are the statistically significant compile-time differences observed for the test suite, spec2000, and spec2006 (there is one). Results were computed from 10 samples each, median aggregation, 95% confidence intervals, and a 0.05 statistical significance level for the Mann–Whitney U test.
>
>   Program                                                             Base  Change      %
>   ---------------------------------------------------------------------------------------
>   MultiSource/Benchmarks/MiBench/security-rijndael/security-rijndael  0.73    0.89  18.88
>
>
> No statistically significant performance differences were observed for spec2000 and spec2006 on a Cortex-A57-like cpu (but see the explanation below). Our infrastructure is currently unable to produce run-time data for the test suite. However, a binary diff shows that the following benchmarks were modified by the change.
>
>   Program
>   ---------------------------------------------------------------------------------------
>   MultiSource/Applications/JM/lencod
>   MultiSource/Applications/minisat
>   MultiSource/Benchmarks/Bullet
>   MultiSource/Benchmarks/MallocBench/espresso
>   spec2006/h264ref
>   spec2006/povray
>
>
> Regarding the lack of performance differences observed in spec2000 and spec2006, the current patch is somewhat limited, and I was planning a follow-on to address the issue. The indices of the GEPs that seed the expressions are forced to be i64. However, the expressions may not require that much precision, so we end up with unneeded extensions and/or narrower vectors than is optimal, and the cost model often prevents us from vectorizing. Please see the test cases in the patch for an example. We are not yet able to vectorize the second case because of this issue.
>
> The follow-on would essentially be to incorporate James's type-shrinking work from the loop vectorizer in order to rewrite the expressions in the narrower type if profitable. Work-in-progress has shown that type-shrinking with this patch can provide a significant performance improvement for spec2006/h264ref, at least.
>
> Please let me know what you think of this plan and whether the optimization is better suited for SLP or SelectionDAG. Thanks again!

I think SLP is a good place for this, given that the expression trees might be quite large. However, a nearly 20% compile-time slowdown (on an application which surely shows no speedup given your list of binary diffs) is not acceptable. Can you please profile running on that application compilation and propose a patch which limits whatever bad behavior is asserting itself there?

http://reviews.llvm.org/D14829