[PATCH] Improve Cost model for SLPVectorizer when we have a vector division by power of 2

Karthik Bhat kv.bhat at samsung.com
Thu Aug 21 06:44:17 PDT 2014


Hi Andrea,
Thanks for the i/p's. Yes for UDIV the cost can be reduced to that of SRL.
For SDIV though I checked the below code with gcc and clang we seem to get 3 extra instructions (vpsrld,vpaddd,vpsrad) for SDIV. Please correct me if i'm wrong.
  void f(int* restrict a,int* restrict b,int* restrict c) {
    int i;
    for(i=0;i<4;i=i+4) {
      a[i] = (b[i]+c[i])/2;
      a[i+1] = (b[i+1]+c[i+1])/2;
      a[i+2] = (b[i+2]+c[i+2])/2;
      a[i+3] = (b[i+3]+c[i+3])/2;
    }
  }
compiled with clang -O3 -mavx2 test.c -S -o test.s
  #BB#0:                                 # %entry
  vmovdqu	(%rdx), %xmm0
  vpaddd	(%rsi), %xmm0, %xmm0
  vpsrld	$31, %xmm0, %xmm1
  vpaddd	%xmm1, %xmm0, %xmm0
  vpsrad	$1, %xmm0, %xmm0
  vmovdqu	%xmm0, (%rdi)
  retq

I agree that reusing the existing table is a good option. I will update the patch accordingly.

Yes we can vectorize non constant power of 2 in avx2 but i suppose it will be using vpsrlvd,vpsravd?
Thanks all for helping me out here.

Regards
Karthik Bhat

http://reviews.llvm.org/D4971






More information about the llvm-commits mailing list