Possible trip-count overflow in loop-unroll

Thu Nov 20 09:33:38 PST 2014

Thank you for the explanation! I'll abandon my patch on Phab.

My only suggestion would be to trim the test case if possible.

On Thu, Nov 20, 2014 at 10:03 AM, Michael Zolotukhin <mzolotukhin at apple.com>
wrote:

> Hi Sanjay,
>
> Indeed, we’ve been working on very similar problems, and actually your fix
> is included into my patch. However, along with the issue your fix is
> targeting, there is one more. It shows up when trip-count isn’t known at
> compile time (replace %cmp = icmp eq i32 %inc, 0 with %cmp = icmp eq i32
> %inc, %N in your example) and we use unroll-factor=2. Using
> unroll-factor=2 is crucial to expose this bug, because for other cases the
> prologue is a loop, and we use this loop if at runtime we find out that the
> trip count overflows. To make this possible for unroll-factor=2 as well we
> need to stop unrolling the prologue if overflow is possible.
>
> I included a test-case to my patch, and return values of @foo differs
> depending on whether the loop was unrolled or not when trunk compiler is
> used. You can inspect that by adding a @main function to print the return
> value.
>
> Below you can find outputs with and without my change, along with my
> annotation:
>
> Without my change (please not that it’s actually not a loop):
> entry:
>   %0 = add i32 %N, 1                    ; <<< If %N is -1, %0 overflows
> and becomes 0 here.
>   %xtraiter = and i32 %0, 1
>   %lcmp.mod = icmp ne i32 %xtraiter, 0
>   %lcmp.overflow = icmp eq i32 %0, 0    ; <<< Here we actually detect the
> overflow, and our
>                                         ;     plan is to go to prologue
> loop to handle that.
>   %lcmp.or = or i1 %lcmp.overflow, %lcmp.mod
>   br i1 %lcmp.or, label %while.body.prol, label %entry.split ; <<< Here we
> jump to the prologue
>
> while.body.prol:                                  ; preds = %entry
>   %1 = bitcast i32 0 to float
>   %cmp.prol = fcmp oge float %1, 0.000000e+00
>   %cmp2.prol = fcmp ole float %1, 1.000000e+00
>   %or.cond.prol = and i1 %cmp.prol, %cmp2.prol
>   %inc.prol = zext i1 %or.cond.prol to i32
>   %inc.count.0.prol = add i32 0, %inc.prol
>   %cmp4.prol = icmp eq i32 0, %N         ; <<< But it’s not a loop any
> more!
>   %inc7.prol = add i32 0, 1              ; <<< We execute a single
> iteration and proceed
>   br label %entry.split                  ; <<<
>
> entry.split:                                      ; preds =
> %while.body.prol, %entry
>   %inc.count.0.lcssa.unr = phi i32 [ 0, %entry ], [ %inc.count.0.prol,
> %while.body.prol ]
>   %storemerge.unr = phi i32 [ 0, %entry ], [ %inc7.prol, %while.body.prol ]
>   %count.0.unr = phi i32 [ 0, %entry ], [ %inc.count.0.prol,
> %while.body.prol ]
>   %2 = icmp ult i32 %0, 2
>   br i1 %2, label %while.end, label %entry.split.split ; <<< And here we
> skip the unrolled loop and go to
>                                                        ;     function exit
>
> entry.split.split:                                ; preds = %entry.split
>   br label %while.body
>
> while.body:                                       ; preds = %while.body,
> %entry.split.split
>   %storemerge = phi i32 [ %storemerge.unr, %entry.split.split ], [
> %inc7.1, %while.body ]
>   %count.0 = phi i32 [ %count.0.unr, %entry.split.split ], [
> %inc.count.0.1, %while.body ]
>   %3 = bitcast i32 %storemerge to float
>   %cmp = fcmp oge float %3, 0.000000e+00
>   %cmp2 = fcmp ole float %3, 1.000000e+00
>   %or.cond = and i1 %cmp, %cmp2
>   %inc = zext i1 %or.cond to i32
>   %inc.count.0 = add i32 %count.0, %inc
>   %inc7 = add i32 %storemerge, 1
>   %4 = bitcast i32 %inc7 to float
>   %cmp.1 = fcmp oge float %4, 0.000000e+00
>   %cmp2.1 = fcmp ole float %4, 1.000000e+00
>   %or.cond.1 = and i1 %cmp.1, %cmp2.1
>   %inc.1 = zext i1 %or.cond.1 to i32
>   %inc.count.0.1 = add i32 %inc.count.0, %inc.1
>   %cmp4.1 = icmp eq i32 %inc7, %N
>   %inc7.1 = add i32 %inc7, 1
>   br i1 %cmp4.1, label %while.end.unr-lcssa, label %while.body
>
>
> With my change (the difference is only in while.body.prol - it’s a loop
> now):
>
> while.body.prol:                                  ; preds =
> %while.body.prol, %entry
>   %storemerge.prol = phi i32 [ 0, %entry ], [ %inc7.prol, %while.body.prol
> ]
>   %count.0.prol = phi i32 [ 0, %entry ], [ %inc.count.0.prol,
> %while.body.prol ]
>   %prol.iter = phi i32 [ %xtraiter, %entry ], [ %prol.iter.sub,
> %while.body.prol ]
>   %1 = bitcast i32 %storemerge.prol to float
>   %cmp.prol = fcmp oge float %1, 0.000000e+00
>   %cmp2.prol = fcmp ole float %1, 1.000000e+00
>   %or.cond.prol = and i1 %cmp.prol, %cmp2.prol
>   %inc.prol = zext i1 %or.cond.prol to i32
>   %inc.count.0.prol = add i32 %count.0.prol, %inc.prol
>   %cmp4.prol = icmp eq i32 %storemerge.prol, %N
>   %inc7.prol = add i32 %storemerge.prol, 1
>   %prol.iter.sub = sub i32 %prol.iter, 1
>   %prol.iter.cmp = icmp ne i32 %prol.iter.sub, 0
>   br i1 %prol.iter.cmp, label %while.body.prol, label %entry.split,
> !llvm.loop !0
>
> Thanks,
> Michael
>
>
> On Nov 20, 2014, at 8:19 AM, Sanjay Patel <spatel at rotateright.com> wrote:
>
> Hi Michael,
>
> I have a similar patch to fix this here:
> http://reviews.llvm.org/D6200
>
> Based on this newer bug report:
> http://llvm.org/bugs/show_bug.cgi?id=21409
>
> How does your change to the prologue check affect the transform? Can you
> post example code with and without that change? I minimized the test case
> in my patch, so it might be easier to explain using that as the example
> code.
>
>
> On Wed, Nov 19, 2014 at 6:36 PM, Michael Zolotukhin <mzolotukhin at apple.com
> > wrote:
>
>> Hi,
>>
>> This patch fixes an issue with possible integer overflow that might be
>> caused by LoopUnroll. The problem was reported here: [1], and than it was
>> partially solved (see [2]), but not completely.
>>
>> This fix aims the following issue in LoopUnroll pass: currently
>> LoopUnroll generates a prologue loop before the main loop body to execute
>> first N%UnrollFactor iterations. When UnrollFactor is 2, then this loop
>> performs a single iteration, and thus its back-edge is optimized away, and
>> the loop actually becomes just a linear code.
>>
>> That would work fine, but it turns out that we also want to use this
>> un-unrolled loop as a last resort if for some reason we can’t use unrolled
>> loop. One of such reasons is possible trip-count overflow.
>>
>> The attached patch makes the condition when we fully unroll the prologue
>> more strict, and thusly fix the issue in question.
>>
>> Does it look ok?
>>
>>
>>
>> Thanks,
>> Michael
>>
>> [1]:
>> http://stackoverflow.com/questions/23838661/why-is-clang-optimizing-this-code-out
>> [2]: http://llvm.org/bugs/show_bug.cgi?id=19846
>>
>>
>> _______________________________________________
>> llvm-commits mailing list
>> llvm-commits at cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>>
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20141120/968b72af/attachment.html>