[llvm] r229731 - Partial fix for bug 22589
Michael Zolotukhin
mzolotukhin at apple.com
Thu Mar 5 17:20:31 PST 2015
Hi,
I committed a fix, suggested by Ahmed (and discussed with Juergen out of the list), in r231443.
Should it be merged to 3.6 branch too?
Thanks,
Michael
> On Mar 5, 2015, at 4:24 PM, Ahmed Bougacha <ahmed.bougacha at gmail.com> wrote:
>
> Also note that this can be reduced to the obvious:
>
> define i64 @foo(<2 x i64> %a) {
> entry:
> %c = shl <2 x i64> %a, <i64 0, i64 2>
> %d = extractelement <2 x i64> %c, i32 0
> ret i64 %d
> }
>
> Which doesn't have anything to do with the OP commit.
>
> -Ahmed
>
>
> On Thu, Mar 5, 2015 at 4:19 PM, Ahmed Bougacha <ahmed.bougacha at gmail.com> wrote:
>> On Thu, Mar 5, 2015 at 2:13 PM, Juergen Ributzka <juergen at apple.com> wrote:
>>> Not exactly, because it is supposed to break it up … but that should be still possible.
>>
>> FWIW, I believe this is the right fix; something like:
>>
>> if (!Amt) {
>> Lo = InL;
>> Hi = InH;
>> return;
>> }
>>
>> In theory, the assumption is valid, that the DAG combines should have
>> caught this earlier. However, here I believe the v2i64 shl (with
>> different shift amounts in each lane) gets expanded to i64 shifts, one
>> with a "0" amount, the other with some non-0 value. Between that
>> legalization and the legalization of the resulting i64 shl, there is
>> no round of DAG combining, so the assumption doesn't hold in practice.
>>
>> Ideally we would run the DAGCombines on the just-legalized nodes,
>> precisely for this reason. But that's a problem for another day =)
>>
>> Juergen, thoughts?
>>
>> -Ahmed
>>
>>> —Juergen
>>>
>>>> On Mar 5, 2015, at 11:44 AM, Sanjoy Das <sanjoy at playingwithpointers.com> wrote:
>>>>
>>>> On Thu, Mar 5, 2015 at 11:35 AM, Juergen Ributzka <juergen at apple.com> wrote:
>>>>> “ExpandShiftByConstant” was written with the assumption that zero shifts
>>>>> would always be optimized away. The code doesn’t check for it nor does it
>>>>> produce correct code when that happens. I added the assert to make sure we
>>>>> catch this case in the future when someone violates this implicit
>>>>> contract/assumption of this method.
>>>>
>>>> So exactly this kind of situation. :)
>>>>
>>>> I don't know SelectionDAG at all, so this is a stab in the dark: does
>>>> it make sense for ExpandShiftByConstant to just return the node being
>>>> shifted if the shift amount is 0? Unless there is a strong invariant
>>>> throughout SelectionDAG w.r.t. constant folding whenever possible,
>>>> making ExpandShiftByConstant smarter should be more robust.
>>>>
>>>> -- Sanjoy
>>>>
>>>>>
>>>>> —Juergen
>>>>>
>>>>> On Mar 5, 2015, at 11:24 AM, Sanjoy Das <sanjoy at playingwithpointers.com>
>>>>> wrote:
>>>>>
>>>>> +CC Juergen since he added the assert
>>>>>
>>>>> This looks like a codegen issue. We end up with a SHL with an Amt of
>>>>> zero and trip an assert on that.
>>>>>
>>>>> $ ./bin/llc -mcpu=yonah fail.ll
>>>>>
>>>>> will reproduce the crash.
>>>>>
>>>>> I don't have the context to debug this though -- why is the assert
>>>>> expected to not trip? What is the invariant that transforms over and
>>>>> construction of SDNodes are supposed to preserve?
>>>>>
>>>>> AFAICT, in the current case, the shift amount starts off being a
>>>>> EXTRACT_VECTOR_ELT of a BUILD_VECTOR that gets constant folded to 0.
>>>>>
>>>>> The reproducing .ll file is attached.
>>>>>
>>>>> -- Sanjoy
>>>>>
>>>>> On Wed, Mar 4, 2015 at 9:36 PM, Sanjoy Das
>>>>> <sanjoy at playingwithpointers.com> wrote:
>>>>>
>>>>> Hi Michael,
>>>>>
>>>>> I'm looking at this now.
>>>>>
>>>>> -- Sanjoy
>>>>>
>>>>> On Wed, Mar 4, 2015 at 9:21 PM, Michael Zolotukhin
>>>>> <mzolotukhin at apple.com> wrote:
>>>>>
>>>>> Hi Sanjoy,
>>>>>
>>>>> This change introduced (or exposed) some assertion failures. Could you
>>>>> please take a look? I’ll try too, but you probably could find the rootcause
>>>>> faster.
>>>>>
>>>>> Reproducer:
>>>>>
>>>>> cat fail.c
>>>>>
>>>>> typedef unsigned long long uint64_t;
>>>>> uint64_t foo(unsigned char *a, unsigned l) {
>>>>> uint64_t Val = 0;
>>>>> for (unsigned i = 0; i != l; ++i)
>>>>> Val |= (uint64_t)a[i] << i*8;
>>>>> return Val;
>>>>> }
>>>>>
>>>>> clang -target i686 -march=yonah -O3 -c -gdwarf-2 fail.c
>>>>>
>>>>> Assertion failed: (Amt && "Expected zero shifts to be already optimized
>>>>> away."), function ExpandShiftByConstant, file
>>>>> …llvm.git/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp, line 1420.
>>>>> 0 clang-3.7 0x000000010d6b2e0e
>>>>> llvm::sys::PrintStackTrace(__sFILE*) + 46
>>>>> 1 clang-3.7 0x000000010d6b41cb
>>>>> PrintStackTraceSignalHandler(void*) + 27
>>>>> 2 clang-3.7 0x000000010d6b4615 SignalHandler(int) + 565
>>>>> 3 libsystem_platform.dylib 0x00007fff95515f1a _sigtramp + 26
>>>>> 4 clang-3.7 0x0000000110668d68 guard variable for
>>>>> isAllowedIDChar(unsigned int, clang::LangOptions const&)::C99AllowedIDChars
>>>>> + 97488
>>>>> 5 clang-3.7 0x000000010d6b41fb raise + 27
>>>>> 6 clang-3.7 0x000000010d6b42b2 abort + 18
>>>>> 7 clang-3.7 0x000000010d6b4291 __assert_rtn + 129
>>>>> 8 clang-3.7 0x000000010dd3e943
>>>>> llvm::DAGTypeLegalizer::ExpandShiftByConstant(llvm::SDNode*, unsigned int,
>>>>> llvm::SDValue&, llvm::SDValue&) + 163
>>>>> 9 clang-3.7 0x000000010dd3b550
>>>>> llvm::DAGTypeLegalizer::ExpandIntRes_Shift(llvm::SDNode*, llvm::SDValue&,
>>>>> llvm::SDValue&) + 240
>>>>> 10 clang-3.7 0x000000010dd311a5
>>>>> llvm::DAGTypeLegalizer::ExpandIntegerResult(llvm::SDNode*, unsigned int) +
>>>>> 3221
>>>>>
>>>>> It works for r229730 and fails for r229731.
>>>>>
>>>>> Thanks,
>>>>> Michael
>>>>>
>>>>> On Feb 18, 2015, at 11:32 AM, Sanjoy Das <sanjoy at playingwithpointers.com>
>>>>> wrote:
>>>>>
>>>>> Author: sanjoy
>>>>> Date: Wed Feb 18 13:32:25 2015
>>>>> New Revision: 229731
>>>>>
>>>>> URL: http://llvm.org/viewvc/llvm-project?rev=229731&view=rev
>>>>> Log:
>>>>> Partial fix for bug 22589
>>>>>
>>>>> Don't spend the entire iteration space in the scalar loop prologue if
>>>>> computing the trip count overflows. This change also gets rid of the
>>>>> backedge check in the prologue loop and the extra check for
>>>>> overflowing trip-count.
>>>>>
>>>>> Differential Revision: http://reviews.llvm.org/D7715
>>>>>
>>>>>
>>>>> Modified:
>>>>> llvm/trunk/lib/Transforms/Utils/LoopUnrollRuntime.cpp
>>>>> llvm/trunk/test/Transforms/LoopUnroll/runtime-loop.ll
>>>>> llvm/trunk/test/Transforms/LoopUnroll/runtime-loop1.ll
>>>>> llvm/trunk/test/Transforms/LoopUnroll/tripcount-overflow.ll
>>>>>
>>>>> Modified: llvm/trunk/lib/Transforms/Utils/LoopUnrollRuntime.cpp
>>>>> URL:
>>>>> http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Transforms/Utils/LoopUnrollRuntime.cpp?rev=229731&r1=229730&r2=229731&view=diff
>>>>> ==============================================================================
>>>>> --- llvm/trunk/lib/Transforms/Utils/LoopUnrollRuntime.cpp (original)
>>>>> +++ llvm/trunk/lib/Transforms/Utils/LoopUnrollRuntime.cpp Wed Feb 18
>>>>> 13:32:25 2015
>>>>> @@ -58,7 +58,7 @@ STATISTIC(NumRuntimeUnrolled,
>>>>> /// - Branch around the original loop if the trip count is less
>>>>> /// than the unroll factor.
>>>>> ///
>>>>> -static void ConnectProlog(Loop *L, Value *TripCount, unsigned Count,
>>>>> +static void ConnectProlog(Loop *L, Value *BECount, unsigned Count,
>>>>> BasicBlock *LastPrologBB, BasicBlock *PrologEnd,
>>>>> BasicBlock *OrigPH, BasicBlock *NewPH,
>>>>> ValueToValueMapTy &VMap, AliasAnalysis *AA,
>>>>> @@ -109,12 +109,19 @@ static void ConnectProlog(Loop *L, Value
>>>>> }
>>>>> }
>>>>>
>>>>> - // Create a branch around the orignal loop, which is taken if the
>>>>> - // trip count is less than the unroll factor.
>>>>> + // Create a branch around the orignal loop, which is taken if there are
>>>>> no
>>>>> + // iterations remaining to be executed after running the prologue.
>>>>> Instruction *InsertPt = PrologEnd->getTerminator();
>>>>> +
>>>>> + assert(Count != 0 && "nonsensical Count!");
>>>>> +
>>>>> + // If BECount <u (Count - 1) then (BECount + 1) & (Count - 1) == (BECount
>>>>> + 1)
>>>>> + // (since Count is a power of 2). This means %xtraiter is (BECount + 1)
>>>>> and
>>>>> + // and all of the iterations of this loop were executed by the prologue.
>>>>> Note
>>>>> + // that if BECount <u (Count - 1) then (BECount + 1) cannot
>>>>> unsigned-overflow.
>>>>> Instruction *BrLoopExit =
>>>>> - new ICmpInst(InsertPt, ICmpInst::ICMP_ULT, TripCount,
>>>>> - ConstantInt::get(TripCount->getType(), Count));
>>>>> + new ICmpInst(InsertPt, ICmpInst::ICMP_ULT, BECount,
>>>>> + ConstantInt::get(BECount->getType(), Count - 1));
>>>>> BasicBlock *Exit = L->getUniqueExitBlock();
>>>>> assert(Exit && "Loop must have a single exit block only");
>>>>> // Split the exit to maintain loop canonicalization guarantees
>>>>> @@ -291,23 +298,28 @@ bool llvm::UnrollRuntimeLoopProlog(Loop
>>>>>
>>>>> // Only unroll loops with a computable trip count and the trip count needs
>>>>> // to be an int value (allowing a pointer type is a TODO item)
>>>>> - const SCEV *BECount = SE->getBackedgeTakenCount(L);
>>>>> - if (isa<SCEVCouldNotCompute>(BECount) ||
>>>>> !BECount->getType()->isIntegerTy())
>>>>> + const SCEV *BECountSC = SE->getBackedgeTakenCount(L);
>>>>> + if (isa<SCEVCouldNotCompute>(BECountSC) ||
>>>>> + !BECountSC->getType()->isIntegerTy())
>>>>> return false;
>>>>>
>>>>> - // If BECount is INT_MAX, we can't compute trip-count without overflow.
>>>>> - if (BECount->isAllOnesValue())
>>>>> - return false;
>>>>> + unsigned BEWidth =
>>>>> cast<IntegerType>(BECountSC->getType())->getBitWidth();
>>>>>
>>>>> // Add 1 since the backedge count doesn't include the first loop iteration
>>>>> const SCEV *TripCountSC =
>>>>> - SE->getAddExpr(BECount, SE->getConstant(BECount->getType(), 1));
>>>>> + SE->getAddExpr(BECountSC, SE->getConstant(BECountSC->getType(), 1));
>>>>> if (isa<SCEVCouldNotCompute>(TripCountSC))
>>>>> return false;
>>>>>
>>>>> // We only handle cases when the unroll factor is a power of 2.
>>>>> // Count is the loop unroll factor, the number of extra copies added + 1.
>>>>> - if ((Count & (Count-1)) != 0)
>>>>> + if (!isPowerOf2_32(Count))
>>>>> + return false;
>>>>> +
>>>>> + // This constraint lets us deal with an overflowing trip count easily;
>>>>> see the
>>>>> + // comment on ModVal below. This check is equivalent to `Log2(Count) <
>>>>> + // BEWidth`.
>>>>> + if (static_cast<uint64_t>(Count) > (1ULL << BEWidth))
>>>>> return false;
>>>>>
>>>>> // If this loop is nested, then the loop unroller changes the code in
>>>>> @@ -333,16 +345,23 @@ bool llvm::UnrollRuntimeLoopProlog(Loop
>>>>> SCEVExpander Expander(*SE, "loop-unroll");
>>>>> Value *TripCount = Expander.expandCodeFor(TripCountSC,
>>>>> TripCountSC->getType(),
>>>>> PreHeaderBR);
>>>>> + Value *BECount = Expander.expandCodeFor(BECountSC, BECountSC->getType(),
>>>>> + PreHeaderBR);
>>>>>
>>>>> IRBuilder<> B(PreHeaderBR);
>>>>> Value *ModVal = B.CreateAnd(TripCount, Count - 1, "xtraiter");
>>>>>
>>>>> - // Check if for no extra iterations, then jump to cloned/unrolled loop.
>>>>> - // We have to check that the trip count computation didn't overflow when
>>>>> - // adding one to the backedge taken count.
>>>>> - Value *LCmp = B.CreateIsNotNull(ModVal, "lcmp.mod");
>>>>> - Value *OverflowCheck = B.CreateIsNull(TripCount, "lcmp.overflow");
>>>>> - Value *BranchVal = B.CreateOr(OverflowCheck, LCmp, "lcmp.or");
>>>>> + // If ModVal is zero, we know that either
>>>>> + // 1. there are no iteration to be run in the prologue loop
>>>>> + // OR
>>>>> + // 2. the addition computing TripCount overflowed
>>>>> + //
>>>>> + // If (2) is true, we know that TripCount really is (1 << BEWidth) and so
>>>>> the
>>>>> + // number of iterations that remain to be run in the original loop is a
>>>>> + // multiple Count == (1 << Log2(Count)) because Log2(Count) <= BEWidth
>>>>> (we
>>>>> + // explicitly check this above).
>>>>> +
>>>>> + Value *BranchVal = B.CreateIsNotNull(ModVal, "lcmp.mod");
>>>>>
>>>>> // Branch to either the extra iterations or the cloned/unrolled loop
>>>>> // We will fix up the true branch label when adding loop body copies
>>>>> @@ -365,10 +384,7 @@ bool llvm::UnrollRuntimeLoopProlog(Loop
>>>>> std::vector<BasicBlock *> NewBlocks;
>>>>> ValueToValueMapTy VMap;
>>>>>
>>>>> - // If unroll count is 2 and we can't overflow in tripcount computation
>>>>> (which
>>>>> - // is BECount + 1), then we don't need a loop for prologue, and we can
>>>>> unroll
>>>>> - // it. We can be sure that we don't overflow only if tripcount is a
>>>>> constant.
>>>>> - bool UnrollPrologue = (Count == 2 && isa<ConstantInt>(TripCount));
>>>>> + bool UnrollPrologue = Count == 2;
>>>>>
>>>>> // Clone all the basic blocks in the loop. If Count is 2, we don't clone
>>>>> // the loop, otherwise we create a cloned loop to execute the extra
>>>>> @@ -394,7 +410,7 @@ bool llvm::UnrollRuntimeLoopProlog(Loop
>>>>> // Connect the prolog code to the original loop and update the
>>>>> // PHI functions.
>>>>> BasicBlock *LastLoopBB = cast<BasicBlock>(VMap[Latch]);
>>>>> - ConnectProlog(L, TripCount, Count, LastLoopBB, PEnd, PH, NewPH, VMap,
>>>>> + ConnectProlog(L, BECount, Count, LastLoopBB, PEnd, PH, NewPH, VMap,
>>>>> /*AliasAnalysis*/ nullptr, DT, LI, LPM->getAsPass());
>>>>> NumRuntimeUnrolled++;
>>>>> return true;
>>>>>
>>>>> Modified: llvm/trunk/test/Transforms/LoopUnroll/runtime-loop.ll
>>>>> URL:
>>>>> http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Transforms/LoopUnroll/runtime-loop.ll?rev=229731&r1=229730&r2=229731&view=diff
>>>>> ==============================================================================
>>>>> --- llvm/trunk/test/Transforms/LoopUnroll/runtime-loop.ll (original)
>>>>> +++ llvm/trunk/test/Transforms/LoopUnroll/runtime-loop.ll Wed Feb 18
>>>>> 13:32:25 2015
>>>>> @@ -4,9 +4,7 @@
>>>>>
>>>>> ; CHECK: %xtraiter = and i32 %n
>>>>> ; CHECK: %lcmp.mod = icmp ne i32 %xtraiter, 0
>>>>> -; CHECK: %lcmp.overflow = icmp eq i32 %n, 0
>>>>> -; CHECK: %lcmp.or = or i1 %lcmp.overflow, %lcmp.mod
>>>>> -; CHECK: br i1 %lcmp.or, label %for.body.prol, label
>>>>> %for.body.preheader.split
>>>>> +; CHECK: br i1 %lcmp.mod, label %for.body.prol, label
>>>>> %for.body.preheader.split
>>>>>
>>>>> ; CHECK: for.body.prol:
>>>>> ; CHECK: %indvars.iv.prol = phi i64 [ %indvars.iv.next.prol, %for.body.prol
>>>>> ], [ 0, %for.body.preheader ]
>>>>>
>>>>> Modified: llvm/trunk/test/Transforms/LoopUnroll/runtime-loop1.ll
>>>>> URL:
>>>>> http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Transforms/LoopUnroll/runtime-loop1.ll?rev=229731&r1=229730&r2=229731&view=diff
>>>>> ==============================================================================
>>>>> --- llvm/trunk/test/Transforms/LoopUnroll/runtime-loop1.ll (original)
>>>>> +++ llvm/trunk/test/Transforms/LoopUnroll/runtime-loop1.ll Wed Feb 18
>>>>> 13:32:25 2015
>>>>> @@ -3,7 +3,7 @@
>>>>> ; This tests that setting the unroll count works
>>>>>
>>>>> ; CHECK: for.body.prol:
>>>>> -; CHECK: br i1 %prol.iter.cmp, label %for.body.prol, label
>>>>> %for.body.preheader.split
>>>>> +; CHECK: br label %for.body.preheader.split
>>>>> ; CHECK: for.body:
>>>>> ; CHECK: br i1 %exitcond.1, label %for.end.loopexit.unr-lcssa, label
>>>>> %for.body
>>>>> ; CHECK-NOT: br i1 %exitcond.4, label %for.end.loopexit{{.*}}, label
>>>>> %for.body
>>>>>
>>>>> Modified: llvm/trunk/test/Transforms/LoopUnroll/tripcount-overflow.ll
>>>>> URL:
>>>>> http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Transforms/LoopUnroll/tripcount-overflow.ll?rev=229731&r1=229730&r2=229731&view=diff
>>>>> ==============================================================================
>>>>> --- llvm/trunk/test/Transforms/LoopUnroll/tripcount-overflow.ll (original)
>>>>> +++ llvm/trunk/test/Transforms/LoopUnroll/tripcount-overflow.ll Wed Feb 18
>>>>> 13:32:25 2015
>>>>> @@ -1,19 +1,28 @@
>>>>> ; RUN: opt < %s -S -unroll-runtime -unroll-count=2 -loop-unroll | FileCheck
>>>>> %s
>>>>> target datalayout = "e-m:o-i64:64-f80:128-n8:16:32:64-S128"
>>>>>
>>>>> -; When prologue is fully unrolled, the branch on its end is unconditional.
>>>>> -; Unrolling it is illegal if we can't prove that trip-count+1 doesn't
>>>>> overflow,
>>>>> -; like in this example, where it comes from an argument.
>>>>> -;
>>>>> -; This test is based on an example from here:
>>>>> -;
>>>>> http://stackoverflow.com/questions/23838661/why-is-clang-optimizing-this-code-out
>>>>> -;
>>>>> +; This test case documents how runtime loop unrolling handles the case
>>>>> +; when the backedge-count is -1.
>>>>> +
>>>>> +; If %N, the backedge-taken count, is -1 then %0 unsigned-overflows
>>>>> +; and is 0. %xtraiter too is 0, signifying that the total trip-count
>>>>> +; is divisible by 2. The prologue then branches to the unrolled loop
>>>>> +; and executes the 2^32 iterations there, in groups of 2.
>>>>> +
>>>>> +
>>>>> +; CHECK: entry:
>>>>> +; CHECK-NEXT: %0 = add i32 %N, 1
>>>>> +; CHECK-NEXT: %xtraiter = and i32 %0, 1
>>>>> +; CHECK-NEXT: %lcmp.mod = icmp ne i32 %xtraiter, 0
>>>>> +; CHECK-NEXT: br i1 %lcmp.mod, label %while.body.prol, label %entry.split
>>>>> +
>>>>> ; CHECK: while.body.prol:
>>>>> -; CHECK: br i1
>>>>> +; CHECK: br label %entry.split
>>>>> +
>>>>> ; CHECK: entry.split:
>>>>>
>>>>> ; Function Attrs: nounwind readnone ssp uwtable
>>>>> -define i32 @foo(i32 %N) #0 {
>>>>> +define i32 @foo(i32 %N) {
>>>>> entry:
>>>>> br label %while.body
>>>>>
>>>>> @@ -26,5 +35,3 @@ while.body:
>>>>> while.end: ; preds = %while.body
>>>>> ret i32 %i
>>>>> }
>>>>> -
>>>>> -attributes #0 = { nounwind readnone ssp uwtable
>>>>> "less-precise-fpmad"="false" "no-frame-pointer-elim"="true"
>>>>> "no-frame-pointer-elim-non-leaf" "no-infs-fp-math"="false"
>>>>> "no-nans-fp-math"="false" "stack-protector-buffer-size"="8"
>>>>> "unsafe-fp-math"="false" "use-soft-float"="false" }
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> llvm-commits mailing list
>>>>> llvm-commits at cs.uiuc.edu
>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>>>>>
>>>>>
>>>>> <fail.ll>
>>>>>
>>>>>
>>>
>>>
>>> _______________________________________________
>>> llvm-commits mailing list
>>> llvm-commits at cs.uiuc.edu
>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
More information about the llvm-commits
mailing list