[LLVMdev] loop vectorizer
Frank Winter
fwinter at jlab.org
Wed Oct 30 09:05:23 PDT 2013
The loop vectorizer seems to be not able to vectorize the following code:
void bar(std::uint64_t start, std::uint64_t end, float * __restrict__
c, float * __restrict__ a, float * __restrict__ b)
{
const std::uint64_t inner = 4;
for (std::uint64_t i = start ; i < end ; ++i ) {
const std::uint64_t ir0 = ( (i/inner) * 2 + 0 ) * inner + i%4;
const std::uint64_t ir1 = ( (i/inner) * 2 + 1 ) * inner + i%4;
c[ ir0 ] = a[ ir0 ] + b[ ir0 ];
c[ ir1 ] = a[ ir1 ] + b[ ir1 ];
}
}
LV: Found a loop: for.body
LV: Found an induction variable.
LV: We need to do 0 pointer comparisons.
LV: Checking memory dependencies
LV: Bad stride - Not an AddRecExpr pointer %arrayidx11 = getelementptr
inbounds float* %c, i64 %add2 SCEV: ((4 * %add2)<nsw> + %c)<nsw>
LV: Bad stride - Not an AddRecExpr pointer %arrayidx15 = getelementptr
inbounds float* %c, i64 %add8 SCEV: ((4 * %add8)<nsw> + %c)<nsw>
LV: Src Scev: ((4 * %add2)<nsw> + %c)<nsw>Sink Scev: ((4 * %add8)<nsw> +
%c)<nsw>(Induction step: 0)
LV: Distance for store float %add10, float* %arrayidx11, align 4 to
store float %add14, float* %arrayidx15, align 4: ((4 * %add8)<nsw> + (-4
* %add2))
Non-consecutive pointer access
LV: We don't need a runtime memory check.
LV: Can't vectorize due to memory conflicts
LV: Not vectorizing.
Here the code:
entry:
%cmp14 = icmp ult i64 %start, %end
br i1 %cmp14, label %for.body, label %for.end
for.body: ; preds = %entry,
%for.body
%i.015 = phi i64 [ %inc, %for.body ], [ %start, %entry ]
%div = lshr i64 %i.015, 2
%mul = shl i64 %div, 3
%rem = and i64 %i.015, 3
%add2 = or i64 %mul, %rem
%add8 = or i64 %add2, 4
%arrayidx = getelementptr inbounds float* %a, i64 %add2
%0 = load float* %arrayidx, align 4
%arrayidx9 = getelementptr inbounds float* %b, i64 %add2
%1 = load float* %arrayidx9, align 4
%add10 = fadd float %0, %1
%arrayidx11 = getelementptr inbounds float* %c, i64 %add2
store float %add10, float* %arrayidx11, align 4
%arrayidx12 = getelementptr inbounds float* %a, i64 %add8
%2 = load float* %arrayidx12, align 4
%arrayidx13 = getelementptr inbounds float* %b, i64 %add8
%3 = load float* %arrayidx13, align 4
%add14 = fadd float %2, %3
%arrayidx15 = getelementptr inbounds float* %c, i64 %add8
store float %add14, float* %arrayidx15, align 4
%inc = add i64 %i.015, 1
%exitcond = icmp eq i64 %inc, %end
br i1 %exitcond, label %for.end, label %for.body
for.end: ; preds = %for.body,
%entry
ret void
Why is it saying Bad stride?Are the 'rem' and 'div' instruction asking
too much from it?
The code should be vectorizable. Here the index access for start=0, end=16:
loop count i = 0 index_0 = 0 index_1 = 4
loop count i = 1 index_0 = 1 index_1 = 5
loop count i = 2 index_0 = 2 index_1 = 6
loop count i = 3 index_0 = 3 index_1 = 7
loop count i = 4 index_0 = 8 index_1 = 12
loop count i = 5 index_0 = 9 index_1 = 13
loop count i = 6 index_0 = 10 index_1 = 14
loop count i = 7 index_0 = 11 index_1 = 15
loop count i = 8 index_0 = 16 index_1 = 20
loop count i = 9 index_0 = 17 index_1 = 21
loop count i = 10 index_0 = 18 index_1 = 22
loop count i = 11 index_0 = 19 index_1 = 23
loop count i = 12 index_0 = 24 index_1 = 28
loop count i = 13 index_0 = 25 index_1 = 29
loop count i = 14 index_0 = 26 index_1 = 30
loop count i = 15 index_0 = 27 index_1 = 31
Any ideas?
Frank
More information about the llvm-dev
mailing list