[PATCH] D60935: [IndVarSimplify] Fixup nowrap flags during LFTR when moving to post-inc (PR31181)

Tue May 28 16:01:15 PDT 2019

reames requested changes to this revision.
reames added a comment.
This revision now requires changes to proceed.

Ok, after reading through the code and trying out a couple of ideas, I'm fairly sure I understand what's going on here.  There's actually (at least) three distinct bugs in the same logic.  The core thing to keep in mind is that we truncate the IV to the backedge taken type, then perform the math on that type.  The trivial alternative would be to extend both to the same type, but that results in markedly worse codegen.  (Still exploring whether that's fundamental.)

The first bug is that if we have a backedge taken count of uint_max<T>, then adding 1 results in 0.  This results in the truncated limit instruction simply being the trunc(IV.Start) which while correct for the exit value, is *also* the starting value.  I haven't been able to write a test case to demonstrate this yet.

The second bug is that once we (correctly) compute the exit value, we rewrite the exit test as IV + 1 Cmp Limit.  The problem is that poison is defined such that *branching on* a poison result is what triggers UB.  As such, we could have had an nsw IV whose post-increment was poison w/o being UB (since we tested against the pre-increment which wasn't).  As such, unless we can prove that (IV+1) isn't poison, we have three choices:

1. Don't convert the post increment, but do the pre-increment version only.
2. Strip nsw/nuw from the increment.  (This is what the current patch does.)
3. Bail out of the transform in this case entirely.

The third is the dead IV case mentioned in patch comments.  Good catch, I'd missed that one until rereading the patch again.  Nikita, can you list a specific test case which exercises this?  I'd have thought that hasConcreteDef case would have caught this?

At the moment, I believe the patch posted is correct, but is unnecessarily conservative.  Instead of only stripping nsw/nuw where necessary, it *always* strips.  A simple, but noteworthy improvement would be to use SCEV's getSignedRange/getUnsignedRange to ensure the increment doesn't overflow and then only stripping on cases where it does.

I'd also prefer to see us simply refuse to switch to post-increment rather than stripping the nsw/nuw.  That seems like a generally better tradeoff for a middle-end transform.  LSR (or some other late transform) could do the stripping form if needed.  (I doubt it would be.)

================
Comment at: llvm/lib/Transforms/Scalar/IndVarSimplify.cpp:2388
+  Value *IncVar = IndVar->getIncomingValueForBlock(L->getLoopLatch());
+  if (auto *BO = dyn_cast<BinaryOperator>(IncVar)) {
+    const SCEVAddRecExpr *AR = cast<SCEVAddRecExpr>(SE->getSCEV(IncVar));
----------------
This test can never fail given the definition of LoopCounter used.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D60935/new/

https://reviews.llvm.org/D60935