[PATCH] [x86] Improve build_vector v8i16 codegen

Bruno Cardoso Lopes bruno.cardoso at gmail.com
Mon Jan 26 06:18:45 PST 2015

Hi nadav, spatel, anemet, chandlerc,

Currently, even with SSE2 enabled, we're generating a series of movds and punpckl to insert elements in v8i16. The testcase in the patch is currently transformed to:

        movd    %r8d, %xmm0
        movd    24(%rsp), %xmm1
        punpcklwd       %xmm1, %xmm0
        movd    %edx, %xmm1
        movd    8(%rsp), %xmm2
        punpcklwd       %xmm2, %xmm1
        punpcklwd       %xmm0, %xmm1
        movd    %ecx, %xmm0
        movd    16(%rsp), %xmm2
        punpcklwd       %xmm2, %xmm0
        movd    %r9d, %xmm2
        movd    %esi, %xmm3
        punpcklwd       %xmm2, %xmm3
        punpcklwd       %xmm0, %xmm3
        punpcklwd       %xmm1, %xmm3

Where it could be replaced by a series of pinsrw instructions, saving 8 instructions:

        pinsrw  $0, %esi, %xmm0
        pinsrw  $1, %edx, %xmm0
        pinsrw  $2, %ecx, %xmm0
        pinsrw  $3, %r8d, %xmm0
        pinsrw  $4, %r9d, %xmm0
        pinsrw  $5, 8(%rsp), %xmm0
        pinsrw  $6, 16(%rsp), %xmm0
        pinsrw  $7, 24(%rsp), %xmm0

This patch adds this change while it also looks for an opportunity where we could transform this into a SHUFFLE+VEC_INSERT_ELTS first. Most part of this patch is about moving some functions around, that will come in a separated commit.



