[PATCH] Improve DAG combine pass on certain IR vector patterns
qcolombet at apple.com
Fri Jan 16 14:40:59 PST 2015
On Jan 16, 2015, at 2:24 PM, Fiona Glaser <fglaser at apple.com> wrote:
>> On Jan 16, 2015, at 2:20 PM, Quentin Colombet <qcolombet at apple.com> wrote:
>> On Jan 16, 2015, at 1:57 PM, Fiona Glaser <fglaser at apple.com> wrote:
>>> > 1. Run your patch through clang-format please. The patch does not follow the LLVM formatting guidelines.
>>> Done, and changed per Mehdi’s suggestions.
>>> > 2. What is the impact of this on arm64 and armv7s generated code? Although the approach makes sense to me, I want to be sure we do not degrade other targets. Note that I do not expect you to run tests if you cannot :).
>>> I don’t think it should even affect any target that doesn’t have canonical vector sizes of both N and 2*N, for a data type of N/2 or smaller. Otherwise the case this patch targets can’t come up.
>> I do not quite follow the condition on the data type, but regarding the vector sizes, for instance, both v2i32 and v4i32 are legal IIRC on ARM, which would indicate that the optimization can kick in there, unless I am missing something of course.
> The case comes up when you have a concat of two vectors of size N/2, creating a vector of size N, which is then concerted with undef to create a 2*N vector.
Thanks for the clarifications, I had missed the N/2 part.
> I figure it -should- at least help if it somehow did come up on ARM, since it effectively converts 128-bit shuffles to 64-bit shuffles in that case.
Well in that case, you could try a test case with i16 type (v2i16, v4i16, and v8i16), I think all of that are legal so it may demonstrate some impact.
>>>> 3. What are the runtime performance impact on x86_64, with and without -mavx2?
>>> I’m not sure in general; this affects a few very specific vector constructs that were being pessimized.
>> Right, but I would have liked some empirical evidences. Sometimes we have surprises with our lowering even when the IR/DAG is supposed to be better :).
> I’m new to this; what’s the typical way of demonstrating this? I tried the llvm external test suite but the test noise is vastly too high to make solid conclusions about performance.
Well, that may be the conclusion: The performance impact may be within the noise.
Since this kind of patterns are very specific, this is not surprising.
For the record, I tend to ignore the tests that run for less than 1 second (too noisy). Then, the noise level is usually around 1% on a quiet computer with fixed frequency, which is not too bad.
Thanks for the follow-up.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the llvm-commits