[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!

Chandler Carruth chandlerc at google.com
Mon Sep 29 06:05:18 PDT 2014


On Tue, Sep 23, 2014 at 4:28 AM, Chandler Carruth <chandlerc at google.com>
wrote:

> AVX2 is still in-flight.


AVX2 is pretty much done.

All of the AVX and AVX2 lowering has now been heavily fuzz tested (a few
million test cases and counting). I believe it is correct.

I've added the basic framework for AVX-512. Nothing interesting is
implemented there, mostly because I think there are still very big
unanswered questions about how AVX-512 should work. For example, it would
be good to lower with index-destructive vs. table-destructive shuffles
based on # of uses, but that isn't really possible today. Even better would
be to actually respect any loop structure or other invariant properties.

There are still plenty of performance gains to be had in AVX or AVX2
(broadcast support, work to combine away intermediate shuffling such as can
be seen in the v32i8 test cases with interleaved unpacks, etc. etc.

However, I think essentially all of the test cases (other than broadcast
and shift test cases) have been fixed. I'd really like to enable this and
let folks submit patches for the few remaining cases that impact them
significantly. As far as I can tell, the new code paths offer very
significant advantages for hardware folks have today with only a few
downsides. While they are less implemented for AVX-512 than the current
code, I don't really think that should be the priority.

Are there any remaining objections?
-Chandler
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140929/851ca54e/attachment.html>


More information about the llvm-dev mailing list