<html>
<head>
<meta content="text/html; charset=ISO-8859-1"
http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
<div class="moz-cite-prefix">I changed the input C to using a 64 bit
type for the loop index (this eliminates 'sext' instructions in
the IR)<br>
<br>
Here the IR produced with clang -O0<br>
<br>
<br>
define float @foo(i64 %start, i64 %end, float* %A) #0 {<br>
entry:<br>
%start.addr = alloca i64, align 8<br>
%end.addr = alloca i64, align 8<br>
%A.addr = alloca float*, align 8<br>
%sum = alloca [4 x float], align 16<br>
%i = alloca i64, align 8<br>
%q = alloca i64, align 8<br>
store i64 %start, i64* %start.addr, align 8<br>
store i64 %end, i64* %end.addr, align 8<br>
store float* %A, float** %A.addr, align 8<br>
%0 = bitcast [4 x float]* %sum to i8*<br>
call void @llvm.memset.p0i8.i64(i8* %0, i8 0, i64 16, i32 16, i1
false)<br>
%1 = load i64* %start.addr, align 8<br>
store i64 %1, i64* %i, align 8<br>
br label %for.cond<br>
<br>
for.cond: ; preds =
%for.inc6, %entry<br>
%2 = load i64* %i, align 8<br>
%3 = load i64* %end.addr, align 8<br>
%cmp = icmp slt i64 %2, %3<br>
br i1 %cmp, label %for.body, label %for.end8<br>
<br>
for.body: ; preds =
%for.cond<br>
store i64 0, i64* %q, align 8<br>
br label %for.cond1<br>
<br>
for.cond1: ; preds =
%for.inc, %for.body<br>
%4 = load i64* %q, align 8<br>
%cmp2 = icmp slt i64 %4, 4<br>
br i1 %cmp2, label %for.body3, label %for.end<br>
<br>
for.body3: ; preds =
%for.cond1<br>
%5 = load i64* %i, align 8<br>
%mul = mul nsw i64 %5, 4<br>
%6 = load i64* %q, align 8<br>
%add = add nsw i64 %mul, %6<br>
%7 = load float** %A.addr, align 8<br>
%arrayidx = getelementptr inbounds float* %7, i64 %add<br>
%8 = load float* %arrayidx, align 4<br>
%9 = load i64* %q, align 8<br>
%arrayidx4 = getelementptr inbounds [4 x float]* %sum, i32 0,
i64 %9<br>
%10 = load float* %arrayidx4, align 4<br>
%add5 = fadd float %10, %8<br>
store float %add5, float* %arrayidx4, align 4<br>
br label %for.inc<br>
<br>
for.inc: ; preds =
%for.body3<br>
%11 = load i64* %q, align 8<br>
%inc = add nsw i64 %11, 1<br>
store i64 %inc, i64* %q, align 8<br>
br label %for.cond1<br>
<br>
for.end: ; preds =
%for.cond1<br>
br label %for.inc6<br>
<br>
for.inc6: ; preds =
%for.end<br>
%12 = load i64* %i, align 8<br>
%inc7 = add nsw i64 %12, 1<br>
store i64 %inc7, i64* %i, align 8<br>
br label %for.cond<br>
<br>
for.end8: ; preds =
%for.cond<br>
%arrayidx9 = getelementptr inbounds [4 x float]* %sum, i32 0,
i64 0<br>
%13 = load float* %arrayidx9, align 4<br>
%arrayidx10 = getelementptr inbounds [4 x float]* %sum, i32 0,
i64 1<br>
%14 = load float* %arrayidx10, align 4<br>
%add11 = fadd float %13, %14<br>
%arrayidx12 = getelementptr inbounds [4 x float]* %sum, i32 0,
i64 2<br>
%15 = load float* %arrayidx12, align 4<br>
%add13 = fadd float %add11, %15<br>
%arrayidx14 = getelementptr inbounds [4 x float]* %sum, i32 0,
i64 3<br>
%16 = load float* %arrayidx14, align 4<br>
%add15 = fadd float %add13, %16<br>
ret float %add15<br>
}<br>
<br>
<br>
<br>
Thus, the inner loop is not unrolled.<br>
<br>
opt -basicaa -loop-vectorize -debug-only=loop-vectorize
-vectorizer-min-trip-count=4 -S sum.ll<br>
<br>
LV: Checking a loop in "foo"<br>
LV: Found a loop: for.cond1<br>
LV: SCEV could not compute the loop exit count.<br>
LV: Not vectorizing.<br>
<br>
opt -basicaa -gvn -loop-vectorize -debug-only=loop-vectorize
-vectorizer-min-trip-count=4 -S sum.ll<br>
<br>
LV: Checking a loop in "foo"<br>
LV: Found a loop: for.cond1<br>
LV: Found an induction variable.<br>
LV: We don't allow storing to uniform addresses<br>
LV: Can't vectorize due to memory conflicts<br>
LV: Not vectorizing.<br>
<br>
<br>
Frank<br>
<br>
<br>
<br>
On 08/11/13 02:49, Renato Golin wrote:<br>
</div>
<blockquote
cite="mid:CAMSE1kd9St1WpvL=OsWZj9d3OXaCBxqgZF373Zj7fcG9OcF4Yg@mail.gmail.com"
type="cite">
<div dir="ltr">On 7 November 2013 17:18, Frank Winter <<a
moz-do-not-send="true" href="mailto:fwinter@jlab.org">fwinter@jlab.org</a>>
wrote:<br>
<div class="gmail_extra">
<div class="gmail_quote">
<blockquote class="gmail_quote">
LV: We don't allow storing to uniform addresses<br>
</blockquote>
</div>
<br>
</div>
<div class="gmail_extra">This is triggering because it didn't
recognize as a reduction variable during the
canVectorizeInstrs() but did recognize that sum[q] is loop
invariant in canVectorizeMemory().</div>
<div class="gmail_extra"><br>
</div>
<div class="gmail_extra">I'm guessing the nested loop was
unrolled because of the low trip-count, and removed, so it
ended up as:</div>
<div class="gmail_extra"><br>
</div>
<div class="gmail_extra">
<div class="gmail_extra">float foo( int start , int end ,
float * A )</div>
<div class="gmail_extra">{</div>
<div class="gmail_extra"> float sum[4] = {0.,0.,0.,0.};</div>
<div class="gmail_extra"> for (int i = start ; i < end ;
++i ) {</div>
<div class="gmail_extra"> sum[0] += A[i*4+0];<br>
</div>
<div class="gmail_extra"> sum[1] += A[i*4+1];<br>
</div>
<div class="gmail_extra"> sum[2] += A[i*4+2];<br>
</div>
<div class="gmail_extra"> sum[3] += A[i*4+3];<br>
</div>
<div class="gmail_extra"> }</div>
<div class="gmail_extra"> return sum[0]+sum[1]+sum[2]+sum[3];</div>
<div class="gmail_extra">}</div>
<div><br>
</div>
<div>but, for some reason, sum[q] wasn't recognized as a
reduction variable, maybe because it was an array of
reduction variables?</div>
<div><br>
</div>
<div>Having the IR would certainly help...</div>
<div><br>
</div>
<div>cheers,</div>
<div>--renato</div>
</div>
</div>
</blockquote>
<br>
<br>
</body>
</html>