[PATCH] Improve Cost model for SLPVectorizer when we have a vector division by power of 2
Karthik Bhat
kv.bhat at samsung.com
Thu Aug 21 06:44:17 PDT 2014
Hi Andrea,
Thanks for the i/p's. Yes for UDIV the cost can be reduced to that of SRL.
For SDIV though I checked the below code with gcc and clang we seem to get 3 extra instructions (vpsrld,vpaddd,vpsrad) for SDIV. Please correct me if i'm wrong.
void f(int* restrict a,int* restrict b,int* restrict c) {
int i;
for(i=0;i<4;i=i+4) {
a[i] = (b[i]+c[i])/2;
a[i+1] = (b[i+1]+c[i+1])/2;
a[i+2] = (b[i+2]+c[i+2])/2;
a[i+3] = (b[i+3]+c[i+3])/2;
}
}
compiled with clang -O3 -mavx2 test.c -S -o test.s
#BB#0: # %entry
vmovdqu (%rdx), %xmm0
vpaddd (%rsi), %xmm0, %xmm0
vpsrld $31, %xmm0, %xmm1
vpaddd %xmm1, %xmm0, %xmm0
vpsrad $1, %xmm0, %xmm0
vmovdqu %xmm0, (%rdi)
retq
I agree that reusing the existing table is a good option. I will update the patch accordingly.
Yes we can vectorize non constant power of 2 in avx2 but i suppose it will be using vpsrlvd,vpsravd?
Thanks all for helping me out here.
Regards
Karthik Bhat
http://reviews.llvm.org/D4971
More information about the llvm-commits
mailing list