[PATCH] D25350: [X86] Enable interleaved memory accesses by default

Mon Oct 10 08:24:53 PDT 2016

mkuper added a comment.

In https://reviews.llvm.org/D25350#566175, @RKSimon wrote:

> Any performance test gains/regressions?

We've had performance gains on internal benchmarks. (And, anecdotally, a performance regression, which turned out to be a gain - we've started vectorizing a loop we were not vectorizing before, and really should have better performance when vectorized - but the dynamic loop count on that loop tends to be 1 or 2...)
As far as I know Intel also had performance gains on the benchmarks they run, but I'll let Ayal/Elena/Dorit speak for themselves.

As to public benchmarks - I don't have an up-to-date SPEC run with this unfortunately. I can run one if you want - unless Intel already happen to have the results handy?

Anyway, if you have internal benchmarks you want to run this on pre-commit, please do - codegen isn't necessarily happy with the shuffle sequences this generates. We've had some really bad regressions initially, which led to r283480. I don't expect any more big surprises, since the x86 cost model is still *really* conservative w.r.t interleaved memory accesses (teaching it to be more precise is a separate issue, and probably ties in with Farhana's work in https://reviews.llvm.org/D24681), but who knows.

https://reviews.llvm.org/D25350