[PATCH] Improve DAG combine pass on certain IR vector patterns

Fri Jan 16 15:11:03 PST 2015

> On Jan 16, 2015, at 3:04 PM, Chandler Carruth <chandlerc at google.com> wrote:
> 
> 
> On Fri, Jan 16, 2015 at 2:40 PM, Quentin Colombet <qcolombet at apple.com <mailto:qcolombet at apple.com>> wrote:
> Well, that may be the conclusion: The performance impact may be within the noise.
> Since this kind of patterns are very specific, this is not surprising.
> For the record, I tend to ignore the tests that run for less than 1 second (too noisy). Then, the noise level is usually around 1% on a quiet computer with fixed frequency, which is not too bad. 
> 
> Numbers would mostly be nice because I don't know if other targets have the thing that makes this such a huge win on x86 -- implicit concat with undef to form 2x-wide vectors.
> 
> This may be an x86-specific win, in which case it should just be added as a target-specific combine.

Isn’t that typical of SIMD architectures in general? That is, if an arch supports both N and 2N vector sizes, an operation on size-N vectors typically clears the top half, right? Or on armv7-like architectures you can modify d0 and then address q0, right? I’m not super familiar with any architectures other than ARMv7 NEON and SSE/AVX that support multiple native sizes though, so correct me if I’m wrong!

I guess the worst case would be something like this:

old pseudocode:

concat xmm2, xmm0, xmm1
shuffle ymm3, ymm2

new pseudocode:

shuffle xmm2, xmm0, xmm1
concat xmm2, xmm3

If the implicit concat isn’t there, and the architecture has no benefit to using smaller shuffles, and the architecture has no two-source shuffle for reasonable element sizes, I guess it could end up with an extra op?

Fiona
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20150116/21a62175/attachment.html>