[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!

Tue Sep 9 15:35:54 PDT 2014

First off, thanks for the *fantastic* testing and investigation. =]

On Tue, Sep 9, 2014 at 3:01 PM, Quentin Colombet <qcolombet at apple.com>
wrote:

> Hi Chandler,
>
> Here is a test case for the biggest offender (oourafft.c).
> To reproduce:
> llc -mcpu=core-avx-i -x86-experimental-vector-shuffle-lowering=true
> repro.ll
> llc -mcpu=core-avx-i -x86-experimental-vector-shuffle-lowering=false
> repro.ll
>
> The main problem is that we miss:
> vmovsd (%rdi,%rcx,8), %xmm2
> vmovlhps %xmm2, %xmm2, %xmm2 ## xmm2 = xmm2[0,0]
> =>
> vmovddup (%rdi,%rcx,8), %xmm2
>
> I do not know how problematic is that (it seems we catch up on the
> performance with just the previous transformation),
>

Actually, this is awesome, because this was also the main problem I saw. I
already wrote the fix, and just need to fix up test case fixes and submit
it. =]

I think blendps is the other big missing piece as mentioned.

> but we also miss:
> vsubpd %xmm1, %xmm0, %xmm2
> vaddpd %xmm1, %xmm0, %xmm0
> vshufpd $2, %xmm0, %xmm2, %xmm0 ## xmm0 = xmm2[0],xmm0[1]
> =>
> vaddsubpd %xmm1, %xmm0, %xmm0
>
> I’ll look into the other regressions.
>

Maybe wait until i can land the duplicate move support and the blendps
support? I'd rather see what the results are after that.

There is also some AVX specific stuff that I've left FIXMEs fore that I
could probably address to pull it up a bit.

FWIW, I've got the main test-suite reproducing your results for x86, but I
don't currently have a nice reproduction for SPEC, so digging into those
would help somewhat more.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140909/937939c8/attachment.html>