[PATCH] Improve DAG combine pass on certain IR vector patterns
fglaser at apple.com
Fri Jan 16 15:11:03 PST 2015
> On Jan 16, 2015, at 3:04 PM, Chandler Carruth <chandlerc at google.com> wrote:
> On Fri, Jan 16, 2015 at 2:40 PM, Quentin Colombet <qcolombet at apple.com <mailto:qcolombet at apple.com>> wrote:
> Well, that may be the conclusion: The performance impact may be within the noise.
> Since this kind of patterns are very specific, this is not surprising.
> For the record, I tend to ignore the tests that run for less than 1 second (too noisy). Then, the noise level is usually around 1% on a quiet computer with fixed frequency, which is not too bad.
> Numbers would mostly be nice because I don't know if other targets have the thing that makes this such a huge win on x86 -- implicit concat with undef to form 2x-wide vectors.
> This may be an x86-specific win, in which case it should just be added as a target-specific combine.
Isn’t that typical of SIMD architectures in general? That is, if an arch supports both N and 2N vector sizes, an operation on size-N vectors typically clears the top half, right? Or on armv7-like architectures you can modify d0 and then address q0, right? I’m not super familiar with any architectures other than ARMv7 NEON and SSE/AVX that support multiple native sizes though, so correct me if I’m wrong!
I guess the worst case would be something like this:
concat xmm2, xmm0, xmm1
shuffle ymm3, ymm2
shuffle xmm2, xmm0, xmm1
concat xmm2, xmm3
If the implicit concat isn’t there, and the architecture has no benefit to using smaller shuffles, and the architecture has no two-source shuffle for reasonable element sizes, I guess it could end up with an extra op?
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the llvm-commits