[PATCH] D12073: Make ScalarEvolution::isKnownPredicate a little smarter

Mon Aug 17 23:51:52 PDT 2015

hfinkel added a comment.

In http://reviews.llvm.org/D12073#226385, @sanjoy wrote:

> In http://reviews.llvm.org/D12073#226151, @hfinkel wrote:
>
> >
>
>
>
>
> > There are now three failing tests, mostly due to IV widening that now does not happen. The new known-predicate results are:
>
> > 
>
> > test/Transforms/IndVarSimplify/elim-extend.ll:
>
> > 
>
> > - {%init,+,1}<nsw><%loop> sle {(1 + %init),+,1}<nsw><%loop>
>
> > - {(sext i32 %init to i64),+,1}<nsw><%loop> sle {(1 + (sext i32 %init to i64)),+,1}<nsw><%loop>
>
> > - {-1,+,1}<nsw><%outerloop> sle {1,+,1}<nuw><%outerloop>
>
> > - {-1,+,1}<nsw><%outerloop> sle {1,+,1}<nuw><nsw><%outerloop>
>
> > 
>
> >   test/Transforms/IndVarSimplify/iv-sext.ll:
>
> > - {-1,+,1}<nsw><%bb> sle {1,+,1}<nuw><%bb>
>
> > - {-1,+,1}<nsw><%bb> sle {1,+,1}<nuw><nsw><%bb>
>
> > 
>
> >   test/Transforms/IndVarSimplify/strengthen-overflow.ll:
>
> > - {%init,+,1}<nsw><%loop> sle {(1 + %init),+,1}<nsw><%loop>
>
> > 
>
> >   These look right, but some consumer of the information seems to make a worse no-wrap deductions as a result (thus, this patch is still WIP).
>
>
> I suspect (haven't verified) this is due to SCEV's caching mechanism. You end up creating a "Sext(LHS)" too soon, before proving all of the things needed to show that LHS is nsw.  Then this suboptimal result gets cached.
>
> In other words, I suspect the stack trace looks like this (youngest frame at top) initially:
>
>   == false because of PendingLoopPredicates, set in frame (A)
>   getImpliedCond(...)  // ends up looking at VAL
>   getLoopBackedgeCount(...)
>   getSignExtend(VAL)  // due to your change
>   getImpliedCond(...)  // ends up looking at VAL .. (A)
>   getLoopBackedgeCount(...)
>   getSignExtend(VAL)
>   
>
> This goes on as
>
>     getSignExtend(VAL)  ==> no simplification, and we end up with
>           UniqueSCEVs[(SignExtend, VAL)] = SignExtendExpr VAL, and
>   	does not set the <nsw> bit in VAL
>     getImpliedCond(...)  // ends up looking at VAL .. (A)
>     getLoopBackedgeCount(...)
>     getSignExtend(VAL)
>
>
> And then finally the oldest frame returns:
>
>   getSignExtend(VAL)  // this returns a widened version of VAL (instead of
>     an explicit sign extend), and also set VAL's nsw bit
>   
>
> However, the damage has been done since `UniqueSCEVs[(SignExtend, VAL)] == (SignExtendExpr VAL)` and that's the value `ScalarEvolution::getSignExtendExpr`  will return from now on.
>
> If you implement your check without actually creating a sign extend, but by directly inferring no-overflow from the nsw or nuw flags, then I suspect this won't happen.

Looking at the test/Transforms/IndVarSimplify/strengthen-overflow.ll test case, for example, here's (in part) what is going on, at least in terms of initial behavior:

SimplifyIndvar::strengthenOverflowingOperation has code that looks like this:

  if (!BO->hasNoUnsignedWrap()) {
    const SCEV *ExtendAfterOp = SE->getZeroExtendExpr(SE->getSCEV(BO), WideTy);
    const SCEV *OpAfterExtend = (SE->*GetExprForBO)(
      SE->getZeroExtendExpr(LHS, WideTy), SE->getZeroExtendExpr(RHS, WideTy),
      SCEV::FlagAnyWrap);
    if (ExtendAfterOp == OpAfterExtend) {
      BO->setHasNoUnsignedWrap();
      SE->forgetValue(BO);
      Changed = true;
    }
  }

  if (!BO->hasNoSignedWrap()) {
    const SCEV *ExtendAfterOp = SE->getSignExtendExpr(SE->getSCEV(BO), WideTy);
    const SCEV *OpAfterExtend = (SE->*GetExprForBO)(
      SE->getSignExtendExpr(LHS, WideTy), SE->getSignExtendExpr(RHS, WideTy),
      SCEV::FlagAnyWrap);
    if (ExtendAfterOp == OpAfterExtend) {
      BO->setHasNoSignedWrap();
      SE->forgetValue(BO);
      Changed = true;
    }
  }

Now the fun part is that the new code actually triggers as part of the checks done under `if (!BO->hasNoUnsignedWrap())`, but we can't prove nuw regardless, but it changes things such that the code under `if (!BO->hasNoSignedWrap())` no longer works as well as it did previously.

We start with LHS = {%init,+,1}<%loop>, RHS = 1, and as part of computing OpAfterExtend (in the unsigned case), we hit this part of ScalarEvolution::getSignExtendExpr:

    // If the backedge is guarded by a comparison with the pre-inc value
    // the addrec is safe. Also, if the entry is guarded by a comparison
    // with the start value and the backedge is guarded by a comparison
    // with the post-inc value, the addrec is safe.
    ICmpInst::Predicate Pred;
    const SCEV *OverflowLimit =
        getSignedOverflowLimitForStep(Step, &Pred, this);
    if (OverflowLimit &&
        (isLoopBackedgeGuardedByCond(L, Pred, AR, OverflowLimit) ||
         (isLoopEntryGuardedByCond(L, Pred, Start, OverflowLimit) &&
          isLoopBackedgeGuardedByCond(L, Pred, AR->getPostIncExpr(*this),
                                      OverflowLimit)))) {
      // Cache knowledge of AR NSW, then propagate NSW to the wide AddRec.
      const_cast<SCEVAddRecExpr *>(AR)->setNoWrapFlags(SCEV::FlagNSW);
      return getAddRecExpr(
          getExtendAddRecStart<SCEVSignExtendExpr>(AR, Ty, this),
          getSignExtendExpr(Step, Ty), L, AR->getNoWrapFlags());
    }
  }

and this cached nsw flag ends up on our RHS, so we now have:

LHS = {%init,+,1}<nsw><%loop>, RHS = 1

and so the call to `SE->getSignExtendExpr(LHS, WideTy)` now returns:

i32 {%init,+,1}<nsw><%loop> to i64)

instead of returning:

{(sext i32 %init to i64),+,1}<nsw><%loop> (as it does without this patch).

as a result, we get:

ExtendAfterOp == {(1 + (sext i32 %init to i64)),+,1}<nsw><%loop>
OpAfterExtend == (1 + (sext i32 {%init,+,1}<nsw><%loop> to i64))

instead of:

ExtendAfterOp == OpAfterExtend == {(1 + (sext i32 %init to i64)),+,1}<nsw><%loop>

interestingly, the results in the nuw case are also slightly different (although in neither case are equal because we can't provide nuw regardless). With the patch we get this:

ExtendAfterOp == (zext i32 {(1 + %init),+,1}<nsw><%loop> to i64)
OpAfterExtend == (1 + (zext i32 {%init,+,1}<nsw><%loop> to i64))

whereas currently we get:

ExtendAfterOp == (zext i32 {(1 + %init),+,1}<%loop> to i64)
OpAfterExtend == (1 + (zext i32 {%init,+,1}<%loop> to i64))

(so we get some extra <nsw>s for our trouble)

http://reviews.llvm.org/D12073