Possible trip-count overflow in loop-unroll

Michael Zolotukhin mzolotukhin at apple.com
Thu Nov 20 09:03:59 PST 2014


Hi Sanjay,

Indeed, we’ve been working on very similar problems, and actually your fix is included into my patch. However, along with the issue your fix is targeting, there is one more. It shows up when trip-count isn’t known at compile time (replace %cmp = icmp eq i32 %inc, 0 with %cmp = icmp eq i32 %inc, %N in your example) and we use unroll-factor=2. Using unroll-factor=2 is crucial to expose this bug, because for other cases the prologue is a loop, and we use this loop if at runtime we find out that the trip count overflows. To make this possible for unroll-factor=2 as well we need to stop unrolling the prologue if overflow is possible.

I included a test-case to my patch, and return values of @foo differs depending on whether the loop was unrolled or not when trunk compiler is used. You can inspect that by adding a @main function to print the return value.

Below you can find outputs with and without my change, along with my annotation:

Without my change (please not that it’s actually not a loop):
entry:
  %0 = add i32 %N, 1                    ; <<< If %N is -1, %0 overflows and becomes 0 here.
  %xtraiter = and i32 %0, 1
  %lcmp.mod = icmp ne i32 %xtraiter, 0
  %lcmp.overflow = icmp eq i32 %0, 0    ; <<< Here we actually detect the overflow, and our
                                        ;     plan is to go to prologue loop to handle that.
  %lcmp.or = or i1 %lcmp.overflow, %lcmp.mod
  br i1 %lcmp.or, label %while.body.prol, label %entry.split ; <<< Here we jump to the prologue

while.body.prol:                                  ; preds = %entry
  %1 = bitcast i32 0 to float
  %cmp.prol = fcmp oge float %1, 0.000000e+00
  %cmp2.prol = fcmp ole float %1, 1.000000e+00
  %or.cond.prol = and i1 %cmp.prol, %cmp2.prol
  %inc.prol = zext i1 %or.cond.prol to i32
  %inc.count.0.prol = add i32 0, %inc.prol
  %cmp4.prol = icmp eq i32 0, %N         ; <<< But it’s not a loop any more!
  %inc7.prol = add i32 0, 1              ; <<< We execute a single iteration and proceed
  br label %entry.split                  ; <<<

entry.split:                                      ; preds = %while.body.prol, %entry
  %inc.count.0.lcssa.unr = phi i32 [ 0, %entry ], [ %inc.count.0.prol, %while.body.prol ]
  %storemerge.unr = phi i32 [ 0, %entry ], [ %inc7.prol, %while.body.prol ]
  %count.0.unr = phi i32 [ 0, %entry ], [ %inc.count.0.prol, %while.body.prol ]
  %2 = icmp ult i32 %0, 2
  br i1 %2, label %while.end, label %entry.split.split ; <<< And here we skip the unrolled loop and go to
                                                       ;     function exit

entry.split.split:                                ; preds = %entry.split
  br label %while.body

while.body:                                       ; preds = %while.body, %entry.split.split
  %storemerge = phi i32 [ %storemerge.unr, %entry.split.split ], [ %inc7.1, %while.body ]
  %count.0 = phi i32 [ %count.0.unr, %entry.split.split ], [ %inc.count.0.1, %while.body ]
  %3 = bitcast i32 %storemerge to float
  %cmp = fcmp oge float %3, 0.000000e+00
  %cmp2 = fcmp ole float %3, 1.000000e+00
  %or.cond = and i1 %cmp, %cmp2
  %inc = zext i1 %or.cond to i32
  %inc.count.0 = add i32 %count.0, %inc
  %inc7 = add i32 %storemerge, 1
  %4 = bitcast i32 %inc7 to float
  %cmp.1 = fcmp oge float %4, 0.000000e+00
  %cmp2.1 = fcmp ole float %4, 1.000000e+00
  %or.cond.1 = and i1 %cmp.1, %cmp2.1
  %inc.1 = zext i1 %or.cond.1 to i32
  %inc.count.0.1 = add i32 %inc.count.0, %inc.1
  %cmp4.1 = icmp eq i32 %inc7, %N
  %inc7.1 = add i32 %inc7, 1
  br i1 %cmp4.1, label %while.end.unr-lcssa, label %while.body


With my change (the difference is only in while.body.prol - it’s a loop now):

while.body.prol:                                  ; preds = %while.body.prol, %entry
  %storemerge.prol = phi i32 [ 0, %entry ], [ %inc7.prol, %while.body.prol ]
  %count.0.prol = phi i32 [ 0, %entry ], [ %inc.count.0.prol, %while.body.prol ]
  %prol.iter = phi i32 [ %xtraiter, %entry ], [ %prol.iter.sub, %while.body.prol ]
  %1 = bitcast i32 %storemerge.prol to float
  %cmp.prol = fcmp oge float %1, 0.000000e+00
  %cmp2.prol = fcmp ole float %1, 1.000000e+00
  %or.cond.prol = and i1 %cmp.prol, %cmp2.prol
  %inc.prol = zext i1 %or.cond.prol to i32
  %inc.count.0.prol = add i32 %count.0.prol, %inc.prol
  %cmp4.prol = icmp eq i32 %storemerge.prol, %N
  %inc7.prol = add i32 %storemerge.prol, 1
  %prol.iter.sub = sub i32 %prol.iter, 1
  %prol.iter.cmp = icmp ne i32 %prol.iter.sub, 0
  br i1 %prol.iter.cmp, label %while.body.prol, label %entry.split, !llvm.loop !0

Thanks,
Michael


> On Nov 20, 2014, at 8:19 AM, Sanjay Patel <spatel at rotateright.com> wrote:
> 
> Hi Michael,
> 
> I have a similar patch to fix this here:
> http://reviews.llvm.org/D6200 <http://reviews.llvm.org/D6200>
> 
> Based on this newer bug report:
> http://llvm.org/bugs/show_bug.cgi?id=21409 <http://llvm.org/bugs/show_bug.cgi?id=21409>
> 
> How does your change to the prologue check affect the transform? Can you post example code with and without that change? I minimized the test case in my patch, so it might be easier to explain using that as the example code.
> 
> 
> On Wed, Nov 19, 2014 at 6:36 PM, Michael Zolotukhin <mzolotukhin at apple.com <mailto:mzolotukhin at apple.com>> wrote:
> Hi,
> 
> This patch fixes an issue with possible integer overflow that might be caused by LoopUnroll. The problem was reported here: [1], and than it was partially solved (see [2]), but not completely.
> 
> This fix aims the following issue in LoopUnroll pass: currently LoopUnroll generates a prologue loop before the main loop body to execute first N%UnrollFactor iterations. When UnrollFactor is 2, then this loop performs a single iteration, and thus its back-edge is optimized away, and the loop actually becomes just a linear code. 
> 
> That would work fine, but it turns out that we also want to use this un-unrolled loop as a last resort if for some reason we can’t use unrolled loop. One of such reasons is possible trip-count overflow.
> 
> The attached patch makes the condition when we fully unroll the prologue more strict, and thusly fix the issue in question.
> 
> Does it look ok?
> 
> 
> 
> Thanks,
> Michael
> 
> [1]: http://stackoverflow.com/questions/23838661/why-is-clang-optimizing-this-code-out <http://stackoverflow.com/questions/23838661/why-is-clang-optimizing-this-code-out>
> [2]: http://llvm.org/bugs/show_bug.cgi?id=19846 <http://llvm.org/bugs/show_bug.cgi?id=19846>
> 
> 
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu <mailto:llvm-commits at cs.uiuc.edu>
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits <http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits>
> 
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20141120/c1f569e5/attachment.html>


More information about the llvm-commits mailing list