[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!
Simon Pilgrim
llvm-dev at redking.me.uk
Sat Sep 20 07:12:39 PDT 2014
On 19 Sep 2014, at 21:22, Andrea Di Biagio <andrea.dibiagio at gmail.com> wrote:
> 2. There are cases where we no longer fold a vector load in one of
> the operands of a shuffle.
> This is an example:
>
> vmovaps 320(%rsp), %xmm0
> vshufps $-27, %xmm0, %xmm0, %xmm0 # %xmm0 = %xmm0[1,1,2,3]
>
> Before, we used to emit the following sequence:
> # 16-byte Folded reload.
> vpshufd $1, 320(%rsp), %xmm0 # %xmm0 = mem[1,0,0,0]
>
> Note: the reason why the shuffle masks are different but still valid
> is because the upper bits in %xmm0 are unused. Later on, the code uses
> register %xmm0 in a 'vcvtss2sd' instruction; only the lower 32-bits of
> %xmm0 have a meaning in this context).
> As for 1. I'll try to create a small reproducible.
Hi Andrea / Chandler / Quentin,
If AVX is available I would expect the vpermilps/vpermilpd instruction to be used for all float/double single vector shuffles, especially as it can deal with the folded load case as well - this would avoid the integer/float execution domain transfer issue with using vpshufd.
Thanks, Simon.
More information about the llvm-dev
mailing list