[llvm] r229731 - Partial fix for bug 22589

Juergen Ributzka juergen at apple.com
Thu Mar 5 11:35:03 PST 2015


“ExpandShiftByConstant” was written with the assumption that zero shifts would always be optimized away. The code doesn’t check for it nor does it produce correct code when that happens. I added the assert to make sure we catch this case in the future when someone violates this implicit contract/assumption of this method.

—Juergen

> On Mar 5, 2015, at 11:24 AM, Sanjoy Das <sanjoy at playingwithpointers.com> wrote:
> 
> +CC Juergen since he added the assert
> 
> This looks like a codegen issue.  We end up with a SHL with an Amt of
> zero and trip an assert on that.
> 
>  $ ./bin/llc -mcpu=yonah fail.ll
> 
> will reproduce the crash.
> 
> I don't have the context to debug this though -- why is the assert
> expected to not trip?  What is the invariant that transforms over and
> construction of SDNodes are supposed to preserve?
> 
> AFAICT, in the current case, the shift amount starts off being a
> EXTRACT_VECTOR_ELT of a BUILD_VECTOR that gets constant folded to 0.
> 
> The reproducing .ll file is attached.
> 
> -- Sanjoy
> 
> On Wed, Mar 4, 2015 at 9:36 PM, Sanjoy Das
> <sanjoy at playingwithpointers.com <mailto:sanjoy at playingwithpointers.com>> wrote:
>> Hi Michael,
>> 
>> I'm looking at this now.
>> 
>> -- Sanjoy
>> 
>> On Wed, Mar 4, 2015 at 9:21 PM, Michael Zolotukhin
>> <mzolotukhin at apple.com> wrote:
>>> Hi Sanjoy,
>>> 
>>> This change introduced (or exposed) some assertion failures. Could you
>>> please take a look? I’ll try too, but you probably could find the rootcause
>>> faster.
>>> 
>>> Reproducer:
>>>> cat fail.c
>>> typedef unsigned long long uint64_t;
>>> uint64_t foo(unsigned char *a, unsigned l) {
>>>  uint64_t Val = 0;
>>>  for (unsigned i = 0; i != l; ++i)
>>>    Val |= (uint64_t)a[i] << i*8;
>>>  return Val;
>>> }
>>> 
>>>> clang -target i686 -march=yonah -O3 -c -gdwarf-2 fail.c
>>> Assertion failed: (Amt && "Expected zero shifts to be already optimized
>>> away."), function ExpandShiftByConstant, file
>>> …llvm.git/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp, line 1420.
>>> 0  clang-3.7                0x000000010d6b2e0e
>>> llvm::sys::PrintStackTrace(__sFILE*) + 46
>>> 1  clang-3.7                0x000000010d6b41cb
>>> PrintStackTraceSignalHandler(void*) + 27
>>> 2  clang-3.7                0x000000010d6b4615 SignalHandler(int) + 565
>>> 3  libsystem_platform.dylib 0x00007fff95515f1a _sigtramp + 26
>>> 4  clang-3.7                0x0000000110668d68 guard variable for
>>> isAllowedIDChar(unsigned int, clang::LangOptions const&)::C99AllowedIDChars
>>> + 97488
>>> 5  clang-3.7                0x000000010d6b41fb raise + 27
>>> 6  clang-3.7                0x000000010d6b42b2 abort + 18
>>> 7  clang-3.7                0x000000010d6b4291 __assert_rtn + 129
>>> 8  clang-3.7                0x000000010dd3e943
>>> llvm::DAGTypeLegalizer::ExpandShiftByConstant(llvm::SDNode*, unsigned int,
>>> llvm::SDValue&, llvm::SDValue&) + 163
>>> 9  clang-3.7                0x000000010dd3b550
>>> llvm::DAGTypeLegalizer::ExpandIntRes_Shift(llvm::SDNode*, llvm::SDValue&,
>>> llvm::SDValue&) + 240
>>> 10 clang-3.7                0x000000010dd311a5
>>> llvm::DAGTypeLegalizer::ExpandIntegerResult(llvm::SDNode*, unsigned int) +
>>> 3221
>>> 
>>> It works for r229730 and fails for r229731.
>>> 
>>> Thanks,
>>> Michael
>>> 
>>> On Feb 18, 2015, at 11:32 AM, Sanjoy Das <sanjoy at playingwithpointers.com>
>>> wrote:
>>> 
>>> Author: sanjoy
>>> Date: Wed Feb 18 13:32:25 2015
>>> New Revision: 229731
>>> 
>>> URL: http://llvm.org/viewvc/llvm-project?rev=229731&view=rev
>>> Log:
>>> Partial fix for bug 22589
>>> 
>>> Don't spend the entire iteration space in the scalar loop prologue if
>>> computing the trip count overflows.  This change also gets rid of the
>>> backedge check in the prologue loop and the extra check for
>>> overflowing trip-count.
>>> 
>>> Differential Revision: http://reviews.llvm.org/D7715
>>> 
>>> 
>>> Modified:
>>>   llvm/trunk/lib/Transforms/Utils/LoopUnrollRuntime.cpp
>>>   llvm/trunk/test/Transforms/LoopUnroll/runtime-loop.ll
>>>   llvm/trunk/test/Transforms/LoopUnroll/runtime-loop1.ll
>>>   llvm/trunk/test/Transforms/LoopUnroll/tripcount-overflow.ll
>>> 
>>> Modified: llvm/trunk/lib/Transforms/Utils/LoopUnrollRuntime.cpp
>>> URL:
>>> http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Transforms/Utils/LoopUnrollRuntime.cpp?rev=229731&r1=229730&r2=229731&view=diff
>>> ==============================================================================
>>> --- llvm/trunk/lib/Transforms/Utils/LoopUnrollRuntime.cpp (original)
>>> +++ llvm/trunk/lib/Transforms/Utils/LoopUnrollRuntime.cpp Wed Feb 18
>>> 13:32:25 2015
>>> @@ -58,7 +58,7 @@ STATISTIC(NumRuntimeUnrolled,
>>> /// - Branch around the original loop if the trip count is less
>>> ///   than the unroll factor.
>>> ///
>>> -static void ConnectProlog(Loop *L, Value *TripCount, unsigned Count,
>>> +static void ConnectProlog(Loop *L, Value *BECount, unsigned Count,
>>>                          BasicBlock *LastPrologBB, BasicBlock *PrologEnd,
>>>                          BasicBlock *OrigPH, BasicBlock *NewPH,
>>>                          ValueToValueMapTy &VMap, AliasAnalysis *AA,
>>> @@ -109,12 +109,19 @@ static void ConnectProlog(Loop *L, Value
>>>    }
>>>  }
>>> 
>>> -  // Create a branch around the orignal loop, which is taken if the
>>> -  // trip count is less than the unroll factor.
>>> +  // Create a branch around the orignal loop, which is taken if there are
>>> no
>>> +  // iterations remaining to be executed after running the prologue.
>>>  Instruction *InsertPt = PrologEnd->getTerminator();
>>> +
>>> +  assert(Count != 0 && "nonsensical Count!");
>>> +
>>> +  // If BECount <u (Count - 1) then (BECount + 1) & (Count - 1) == (BECount
>>> + 1)
>>> +  // (since Count is a power of 2).  This means %xtraiter is (BECount + 1)
>>> and
>>> +  // and all of the iterations of this loop were executed by the prologue.
>>> Note
>>> +  // that if BECount <u (Count - 1) then (BECount + 1) cannot
>>> unsigned-overflow.
>>>  Instruction *BrLoopExit =
>>> -    new ICmpInst(InsertPt, ICmpInst::ICMP_ULT, TripCount,
>>> -                 ConstantInt::get(TripCount->getType(), Count));
>>> +    new ICmpInst(InsertPt, ICmpInst::ICMP_ULT, BECount,
>>> +                 ConstantInt::get(BECount->getType(), Count - 1));
>>>  BasicBlock *Exit = L->getUniqueExitBlock();
>>>  assert(Exit && "Loop must have a single exit block only");
>>>  // Split the exit to maintain loop canonicalization guarantees
>>> @@ -291,23 +298,28 @@ bool llvm::UnrollRuntimeLoopProlog(Loop
>>> 
>>>  // Only unroll loops with a computable trip count and the trip count needs
>>>  // to be an int value (allowing a pointer type is a TODO item)
>>> -  const SCEV *BECount = SE->getBackedgeTakenCount(L);
>>> -  if (isa<SCEVCouldNotCompute>(BECount) ||
>>> !BECount->getType()->isIntegerTy())
>>> +  const SCEV *BECountSC = SE->getBackedgeTakenCount(L);
>>> +  if (isa<SCEVCouldNotCompute>(BECountSC) ||
>>> +      !BECountSC->getType()->isIntegerTy())
>>>    return false;
>>> 
>>> -  // If BECount is INT_MAX, we can't compute trip-count without overflow.
>>> -  if (BECount->isAllOnesValue())
>>> -    return false;
>>> +  unsigned BEWidth =
>>> cast<IntegerType>(BECountSC->getType())->getBitWidth();
>>> 
>>>  // Add 1 since the backedge count doesn't include the first loop iteration
>>>  const SCEV *TripCountSC =
>>> -    SE->getAddExpr(BECount, SE->getConstant(BECount->getType(), 1));
>>> +    SE->getAddExpr(BECountSC, SE->getConstant(BECountSC->getType(), 1));
>>>  if (isa<SCEVCouldNotCompute>(TripCountSC))
>>>    return false;
>>> 
>>>  // We only handle cases when the unroll factor is a power of 2.
>>>  // Count is the loop unroll factor, the number of extra copies added + 1.
>>> -  if ((Count & (Count-1)) != 0)
>>> +  if (!isPowerOf2_32(Count))
>>> +    return false;
>>> +
>>> +  // This constraint lets us deal with an overflowing trip count easily;
>>> see the
>>> +  // comment on ModVal below.  This check is equivalent to `Log2(Count) <
>>> +  // BEWidth`.
>>> +  if (static_cast<uint64_t>(Count) > (1ULL << BEWidth))
>>>    return false;
>>> 
>>>  // If this loop is nested, then the loop unroller changes the code in
>>> @@ -333,16 +345,23 @@ bool llvm::UnrollRuntimeLoopProlog(Loop
>>>  SCEVExpander Expander(*SE, "loop-unroll");
>>>  Value *TripCount = Expander.expandCodeFor(TripCountSC,
>>> TripCountSC->getType(),
>>>                                            PreHeaderBR);
>>> +  Value *BECount = Expander.expandCodeFor(BECountSC, BECountSC->getType(),
>>> +                                          PreHeaderBR);
>>> 
>>>  IRBuilder<> B(PreHeaderBR);
>>>  Value *ModVal = B.CreateAnd(TripCount, Count - 1, "xtraiter");
>>> 
>>> -  // Check if for no extra iterations, then jump to cloned/unrolled loop.
>>> -  // We have to check that the trip count computation didn't overflow when
>>> -  // adding one to the backedge taken count.
>>> -  Value *LCmp = B.CreateIsNotNull(ModVal, "lcmp.mod");
>>> -  Value *OverflowCheck = B.CreateIsNull(TripCount, "lcmp.overflow");
>>> -  Value *BranchVal = B.CreateOr(OverflowCheck, LCmp, "lcmp.or");
>>> +  // If ModVal is zero, we know that either
>>> +  //  1. there are no iteration to be run in the prologue loop
>>> +  // OR
>>> +  //  2. the addition computing TripCount overflowed
>>> +  //
>>> +  // If (2) is true, we know that TripCount really is (1 << BEWidth) and so
>>> the
>>> +  // number of iterations that remain to be run in the original loop is a
>>> +  // multiple Count == (1 << Log2(Count)) because Log2(Count) <= BEWidth
>>> (we
>>> +  // explicitly check this above).
>>> +
>>> +  Value *BranchVal = B.CreateIsNotNull(ModVal, "lcmp.mod");
>>> 
>>>  // Branch to either the extra iterations or the cloned/unrolled loop
>>>  // We will fix up the true branch label when adding loop body copies
>>> @@ -365,10 +384,7 @@ bool llvm::UnrollRuntimeLoopProlog(Loop
>>>  std::vector<BasicBlock *> NewBlocks;
>>>  ValueToValueMapTy VMap;
>>> 
>>> -  // If unroll count is 2 and we can't overflow in tripcount computation
>>> (which
>>> -  // is BECount + 1), then we don't need a loop for prologue, and we can
>>> unroll
>>> -  // it. We can be sure that we don't overflow only if tripcount is a
>>> constant.
>>> -  bool UnrollPrologue = (Count == 2 && isa<ConstantInt>(TripCount));
>>> +  bool UnrollPrologue = Count == 2;
>>> 
>>>  // Clone all the basic blocks in the loop. If Count is 2, we don't clone
>>>  // the loop, otherwise we create a cloned loop to execute the extra
>>> @@ -394,7 +410,7 @@ bool llvm::UnrollRuntimeLoopProlog(Loop
>>>  // Connect the prolog code to the original loop and update the
>>>  // PHI functions.
>>>  BasicBlock *LastLoopBB = cast<BasicBlock>(VMap[Latch]);
>>> -  ConnectProlog(L, TripCount, Count, LastLoopBB, PEnd, PH, NewPH, VMap,
>>> +  ConnectProlog(L, BECount, Count, LastLoopBB, PEnd, PH, NewPH, VMap,
>>>                /*AliasAnalysis*/ nullptr, DT, LI, LPM->getAsPass());
>>>  NumRuntimeUnrolled++;
>>>  return true;
>>> 
>>> Modified: llvm/trunk/test/Transforms/LoopUnroll/runtime-loop.ll
>>> URL:
>>> http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Transforms/LoopUnroll/runtime-loop.ll?rev=229731&r1=229730&r2=229731&view=diff
>>> ==============================================================================
>>> --- llvm/trunk/test/Transforms/LoopUnroll/runtime-loop.ll (original)
>>> +++ llvm/trunk/test/Transforms/LoopUnroll/runtime-loop.ll Wed Feb 18
>>> 13:32:25 2015
>>> @@ -4,9 +4,7 @@
>>> 
>>> ; CHECK: %xtraiter = and i32 %n
>>> ; CHECK:  %lcmp.mod = icmp ne i32 %xtraiter, 0
>>> -; CHECK:  %lcmp.overflow = icmp eq i32 %n, 0
>>> -; CHECK:  %lcmp.or = or i1 %lcmp.overflow, %lcmp.mod
>>> -; CHECK:  br i1 %lcmp.or, label %for.body.prol, label
>>> %for.body.preheader.split
>>> +; CHECK:  br i1 %lcmp.mod, label %for.body.prol, label
>>> %for.body.preheader.split
>>> 
>>> ; CHECK: for.body.prol:
>>> ; CHECK: %indvars.iv.prol = phi i64 [ %indvars.iv.next.prol, %for.body.prol
>>> ], [ 0, %for.body.preheader ]
>>> 
>>> Modified: llvm/trunk/test/Transforms/LoopUnroll/runtime-loop1.ll
>>> URL:
>>> http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Transforms/LoopUnroll/runtime-loop1.ll?rev=229731&r1=229730&r2=229731&view=diff
>>> ==============================================================================
>>> --- llvm/trunk/test/Transforms/LoopUnroll/runtime-loop1.ll (original)
>>> +++ llvm/trunk/test/Transforms/LoopUnroll/runtime-loop1.ll Wed Feb 18
>>> 13:32:25 2015
>>> @@ -3,7 +3,7 @@
>>> ; This tests that setting the unroll count works
>>> 
>>> ; CHECK: for.body.prol:
>>> -; CHECK: br i1 %prol.iter.cmp, label %for.body.prol, label
>>> %for.body.preheader.split
>>> +; CHECK: br label %for.body.preheader.split
>>> ; CHECK: for.body:
>>> ; CHECK: br i1 %exitcond.1, label %for.end.loopexit.unr-lcssa, label
>>> %for.body
>>> ; CHECK-NOT: br i1 %exitcond.4, label %for.end.loopexit{{.*}}, label
>>> %for.body
>>> 
>>> Modified: llvm/trunk/test/Transforms/LoopUnroll/tripcount-overflow.ll
>>> URL:
>>> http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Transforms/LoopUnroll/tripcount-overflow.ll?rev=229731&r1=229730&r2=229731&view=diff
>>> ==============================================================================
>>> --- llvm/trunk/test/Transforms/LoopUnroll/tripcount-overflow.ll (original)
>>> +++ llvm/trunk/test/Transforms/LoopUnroll/tripcount-overflow.ll Wed Feb 18
>>> 13:32:25 2015
>>> @@ -1,19 +1,28 @@
>>> ; RUN: opt < %s -S -unroll-runtime -unroll-count=2 -loop-unroll | FileCheck
>>> %s
>>> target datalayout = "e-m:o-i64:64-f80:128-n8:16:32:64-S128"
>>> 
>>> -; When prologue is fully unrolled, the branch on its end is unconditional.
>>> -; Unrolling it is illegal if we can't prove that trip-count+1 doesn't
>>> overflow,
>>> -; like in this example, where it comes from an argument.
>>> -;
>>> -; This test is based on an example from here:
>>> -;
>>> http://stackoverflow.com/questions/23838661/why-is-clang-optimizing-this-code-out
>>> -;
>>> +; This test case documents how runtime loop unrolling handles the case
>>> +; when the backedge-count is -1.
>>> +
>>> +; If %N, the backedge-taken count, is -1 then %0 unsigned-overflows
>>> +; and is 0.  %xtraiter too is 0, signifying that the total trip-count
>>> +; is divisible by 2.  The prologue then branches to the unrolled loop
>>> +; and executes the 2^32 iterations there, in groups of 2.
>>> +
>>> +
>>> +; CHECK: entry:
>>> +; CHECK-NEXT: %0 = add i32 %N, 1
>>> +; CHECK-NEXT: %xtraiter = and i32 %0, 1
>>> +; CHECK-NEXT: %lcmp.mod = icmp ne i32 %xtraiter, 0
>>> +; CHECK-NEXT: br i1 %lcmp.mod, label %while.body.prol, label %entry.split
>>> +
>>> ; CHECK: while.body.prol:
>>> -; CHECK: br i1
>>> +; CHECK: br label %entry.split
>>> +
>>> ; CHECK: entry.split:
>>> 
>>> ; Function Attrs: nounwind readnone ssp uwtable
>>> -define i32 @foo(i32 %N) #0 {
>>> +define i32 @foo(i32 %N) {
>>> entry:
>>>  br label %while.body
>>> 
>>> @@ -26,5 +35,3 @@ while.body:
>>> while.end:                                        ; preds = %while.body
>>>  ret i32 %i
>>> }
>>> -
>>> -attributes #0 = { nounwind readnone ssp uwtable
>>> "less-precise-fpmad"="false" "no-frame-pointer-elim"="true"
>>> "no-frame-pointer-elim-non-leaf" "no-infs-fp-math"="false"
>>> "no-nans-fp-math"="false" "stack-protector-buffer-size"="8"
>>> "unsafe-fp-math"="false" "use-soft-float"="false" }
>>> 
>>> 
>>> _______________________________________________
>>> llvm-commits mailing list
>>> llvm-commits at cs.uiuc.edu
>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>>> 
>>> 
> <fail.ll>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20150305/a45ff7cb/attachment.html>


More information about the llvm-commits mailing list