[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!

Sun Sep 21 13:15:49 PDT 2014

On 20 Sep 2014, at 19:44, Chandler Carruth <chandlerc at google.com> wrote:

> If AVX is available I would expect the vpermilps/vpermilpd instruction to be used for all float/double single vector shuffles, especially as it can deal with the folded load case as well - this would avoid the integer/float execution domain transfer issue with using vpshufd.
> 
> Yes, this is the obvious solution to folding memory loads. It just isn't implemented yet.
> 
> Well, actually it is, but I haven't finished writing tests for it. =] 

Thanks Chandler - vpermilps/vpermilpd generation looks great now.

I've found another regression - byte shifts on pre-ssse3 targets are failing to make use of the vpslldq/vpsrldq instructions - I've attached some basic test cases.

Could vpslldq/vpsrldq be used on ssse3+ targets for the cases where zeros are being shifted in? It avoids the need for a zero register (although they aren't as good for memory folding).

Cheers, Simon. 

-------------- next part --------------
A non-text attachment was scrubbed...
Name: byte_shift.ll
Type: application/octet-stream
Size: 4589 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140921/309c3196/attachment.obj>