<br><br><div class="gmail_quote">2012/2/27 Benjamin Kramer <span dir="ltr"><<a href="mailto:benny.kra@googlemail.com">benny.kra@googlemail.com</a>></span><br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<div class="HOEnZb"><div class="h5"><br>

On 27.02.2012, at 18:49, Eli Friedman wrote:<br>

<br>

> On Mon, Feb 27, 2012 at 9:30 AM, Benjamin Kramer<br>

> <<a href="mailto:benny.kra@googlemail.com">benny.kra@googlemail.com</a>> wrote:<br>

>><br>

>> On 27.02.2012, at 17:13, îÉËĎĚÁĘ ěÉČĎÇŇŐÄ wrote:<br>

>><br>

>>> Dear LLVM,<br>

>>><br>

>>> š š Consider two loops with one interation -<br>

>>> š š First with constant lower bound, second with usual non-constant lower bound:<br>

>>><br>

>>> š š int main(int argc, char ** argv)<br>

>>> š š {<br>

>>> š š š š int numOfIterations= 1;<br>

>>> š š š š int stride=1;<br>

>>> š š š š int lowerBound = 1000; - 1st | int lowerBound = argc; - 2nd<br>

>>> š š š š int upperBound = lowerBound + (numOfIterations - 1)*stride;<br>

>>><br>

>>> š š š š int i = lowerBound;<br>

>>><br>

>>> š š š š int s = 0;<br>

>>> š š š š while(i <= upperBound)<br>

>>> š š š š {<br>

>>> š š š š š šs += i;<br>

>>> š š š š š ši++;<br>

>>> š š š š }<br>

>>> š š š š return s;<br>

>>> š š }<br>

>>> š šAfter the application of -O3 optimization:<br>

>>><br>

>>> š šThe first loop was unrolled:<br>

>>><br>

>>> š š š šdefine i32 @main(i32 %argc, i8** nocapture %argv) nounwind uwtable readnone {<br>

>>> š š š šentry:<br>

>>> š š š š šret i32 1000<br>

>>> š š š š}<br>

>>><br>

>>> š šBut the second was not:<br>

>>><br>

>>> š š š šdefine i32 @main(i32 %argc, i8** nocapture %argv) nounwind uwtable readnone {<br>

>>> š š š šentry:<br>

>>> š š š š šbr label %"3"<br>

>>><br>

>>> š š š š"3": š š š š š š š š š š š š š š š š š š š š š š š; preds = %entry, %"3"<br>

>>> š š š š š %0 = phi i32 [ 0, %entry ], [ %2, %"3" ]<br>

>>> š š š š š %1 = phi i32 [ %argc, %entry ], [ %3, %"3" ]<br>

>>> š š š š š %2 = add i32 %0, %1<br>

>>> š š š š š %3 = add i32 %1, 1<br>

>>> š š š š š %4 = icmp sgt i32 %3, %argc<br>

>>> š š š š š br i1 %4, label %"5", label %"3"<br>

>>><br>

>>> š š š š"5": š š š š š š š š š š š š š š š š š š š š š š š; preds = %"3"<br>

>>> š š š š š ret i32 %2<br>

>>> š š š }<br>

>>><br>

>>> š š The questions:<br>

>>> š š š Is it possible to unroll loop with non-constant boundary using standard LLVM 3.1 facilities, and, if it is, how I can do that?<br>

>><br>

>> Hi Nicolas,<br>

>><br>

>> LLVM misses this optimization because ScalarEvolution's ComputeExitLimitFromICmp doesn't handle signed <= (SLE) and thus can't compute the number of times the loop is executed. I wonder if there's a reason for this, it seems like something simple to add.<br>


>><br>

>> Attached is a completely untested patch that adds the missing logic.<br>

><br>

> Off the top of my head, the attached is missing a check for whether<br>

> the increment is nsw, and the upper bound computation is wrong.<br>

<br>

</div></div>No doubt, it was just a test if this would work at all.<br>

<br>

Looking deeper into SCEV, this should really be canonicalized in SimplifyICmpOperands. Currently it checks whether the value cannot possibly be the maximum or minimum signed value and only does the transform if that's the case.<br>


<br>

I wonder if that condition could be weakened a bit.<br>

<br>

- Ben</blockquote></div><br><div><div>Hi, Benjamin!</div><div>š šThank you for your answer, with your patch it works better.</div><div>š šBut I still have a problem - loops are unrolled only with stride equal one.</div><div>

š šFor example,š</div><div>š š{</div><div>š š int i = lowerBound;</div><div>š š int s = 0;</div><div>š š while(i <= upperBound)</div><div>š š {</div><div>š š š š s += i;</div><div>š š š š i+=32;</div><div>š š }</div><div>

š } šis not urolled!</div><div>š At first, I think that reason is at condition</div><div>š if (isSigned) {</div><div>š š š š APInt Max = APInt::getSignedMaxValue(BitWidth);</div><div>š š š š if ((Max - getSignedRange(getMinusSCEV(Step, One)).getSignedMax())</div>

<div>š š š š š š š .slt(getSignedRange(RHS).getSignedMax()))</div><div>š š š š š return getCouldNotCompute();</div><div>š š š }</div><div>š in "ScalarEvolution::ExitLimit ScalarEvolution::HowManyLessThans(const SCEV *LHS, const SCEV *RHS,</div>

<div>š š š š š š š š š š š š š š š š š const Loop *L, bool isSigned)",</div><div>š š which, In my understanding, is performed for all LHS and RHS with same bit size and not equal one Step, so function returns getCouldNotCompute(), please, correct me, if it is not.</div>

<div>š šBut next, I change type of i to int64_t, that condition become false, function continued, but loop still was not unrolled.š</div><div>š šSo, the questions:</div><div>š š 1. How to overcome upper condition for LHS and RHS with same bit size and not equal one stride?</div>

<div>š š 2. What more prevents the calculation of number of interations and how to deal with it?</div><div><br></div><div>Best regards,</div><div>š š š Nicolas</div></div>