[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!

Fri Sep 5 09:32:13 PDT 2014

Hi Chandler,

I've done some informal benchmarking on an AMD Jaguar core (amd16h)
with and without the experimental flag.  The tests were a mixture of
FP and Integer tests.  I didn't see any significant performance
regression, with most of the differances being in the noise (less than
1%).  One test, however, did show a performance improvement of ~4%.

Unfortunately, another team, while doing internal testing has seen the
new path generating illegal insertps masks.  A sample here:

    vinsertps    $256, %xmm0, %xmm13, %xmm4 # xmm4 = xmm0[0],xmm13[1,2,3]
    vinsertps    $256, %xmm1, %xmm0, %xmm6 # xmm6 = xmm1[0],xmm0[1,2,3]
    vinsertps    $256, %xmm13, %xmm1, %xmm7 # xmm7 = xmm13[0],xmm1[1,2,3]
    vinsertps    $416, %xmm1, %xmm4, %xmm14 # xmm14 = xmm4[0,1],xmm1[2],xmm4[3]
    vinsertps    $416, %xmm13, %xmm6, %xmm13 # xmm13 =
xmm6[0,1],xmm13[2],xmm6[3]
    vinsertps    $416, %xmm0, %xmm7, %xmm0 # xmm0 = xmm7[0,1],xmm0[2],xmm7[3]

We'll continue to look into this and do additional testing.

Thanks,
Rob.

--

Robert Lougher
SN Systems - Sony Computer Entertainment Group