[llvm-commits] Please review - One more shuffle optimization for AVX

Demikhovsky, Elena elena.demikhovsky at intel.com
Sun Jun 24 05:29:33 PDT 2012


Hi,

I have a bunch of optimizations for AVX and AVX2 code that I recently did. Most of them show significant performance speedup on real workloads.
I’ll send it to review one by one accompanied with appropriate tests.

The current patch optimizes frequently used shuffle patterns and gives these instruction sequence reduction.
Before:
       vshufps $-35, %xmm1, %xmm0, %xmm2 ## xmm2 = xmm0[1,3],xmm1[1,3]
        vpermilps       $-40, %xmm2, %xmm2 ## xmm2 = xmm2[0,2,1,3]
        vextractf128    $1, %ymm1, %xmm1
        vextractf128    $1, %ymm0, %xmm0
        vshufps $-35, %xmm1, %xmm0, %xmm0 ## xmm0 = xmm0[1,3],xmm1[1,3]
        vpermilps       $-40, %xmm0, %xmm0 ## xmm0 = xmm0[0,2,1,3]
        vinsertf128     $1, %xmm0, %ymm2, %ymm0
After:
       vshufps $13, %ymm0, %ymm1, %ymm1 ## ymm1 = ymm1[1,3],ymm0[0,0],ymm1[5,7],ymm0[4,4]
       vshufps $13, %ymm0, %ymm0, %ymm0 ## ymm0 = ymm0[1,3,0,0,5,7,4,4]
       vunpcklps       %ymm1, %ymm0, %ymm0 ## ymm0 = ymm0[0],ymm1[0],ymm0[1],ymm1[1],ymm0[4],ymm1[4],ymm0[5],ymm1[5]

  Thank you

- Elena


---------------------------------------------------------------------
Intel Israel (74) Limited

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: avx_opt1.diff
Type: application/octet-stream
Size: 4449 bytes
Desc: avx_opt1.diff
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20120624/37fa88d7/attachment.obj>


More information about the llvm-commits mailing list