[PATCH] Adjust the cost of vectorized SHL/SRL/SRA

Fri May 22 10:03:47 PDT 2015

In http://reviews.llvm.org/D9923#177012, @aschwaighofer wrote:

> I share Simon's concerns. Please make sure that we still get a good estimate for kernels like (these are from the rdar mentioned in the commit).
>
>   #define TYPE char
>   #define OP >>
>   #define SIZE 1024
>   #define TYPE_ALIGN __attribute__((aligned(16)))
>   
>   TYPE A1[SIZE] TYPE_ALIGN;
>   TYPE B1[SIZE] TYPE_ALIGN;
>   TYPE C1[SIZE] TYPE_ALIGN;
>   
>   void kernel1() {
>     for (int i = 0; i < SIZE; ++i) {
>       A1[i] = B1[i] OP C1[i];
>   }
>   
>
> or:
>
>   for(k=0, r=0; k<pos; k++)
>     r += (MAX_UNSIGNED) 1 << k;

Thanks for sharing the testcase. For the first testcase:

Without the patch, the generated code for the kernel loop is:
.LBB0_1:                                # %for.body

1. =>This Inner Loop Header: Depth=1 movsbl  B1+1024(%rax), %edx movb    C1+1024(%rax), %cl sarl    %cl, %edx movb    %dl, A1+1024(%rax) incq    %rax jne     .LBB0_1

With the patch, the generated code for the kernel loop is:
.LBB0_1:                                # %vector.body

1. =>This Inner Loop Header: Depth=1 movd    B1+1024(%rax), %xmm1    # xmm1 = mem[0],zero,zero,zero punpcklbw       %xmm1, %xmm1    # xmm1 = xmm1[0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7] punpcklwd       %xmm1, %xmm1    # xmm1 = xmm1[0,0,1,1,2,2,3,3] psrad   $24, %xmm1 movd    C1+1024(%rax), %xmm2    # xmm2 = mem[0],zero,zero,zero punpcklbw       %xmm2, %xmm2    # xmm2 = xmm2[0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7] punpcklwd       %xmm2, %xmm2    # xmm2 = xmm2[0,0,1,1,2,2,3,3] psrad   $24, %xmm2 psrad   %xmm2, %xmm1 pand    %xmm0, %xmm1 packuswb        %xmm1, %xmm1 packuswb        %xmm1, %xmm1 movd    %xmm1, A1+1024(%rax) addq    $4, %rax jne     .LBB0_1

The vectorized version is slightly better than the scalarized version. But the cost estimation to compute VF is not very good -- The cost estimation shows cost is 8 when VF==1 and cost is 2 when VF==4. The estimated costs of vectorized sext and trunc are too low and don't match the real costs.

Another problem is that vectorizer doesn't know the char->int type promotion here is unnecessary.

Can you give me the whole version of the second testcase? I am not sure my tweaked version is the right one.

REPOSITORY
  rL LLVM

http://reviews.llvm.org/D9923

EMAIL PREFERENCES
  http://reviews.llvm.org/settings/panel/emailpreferences/