[PATCH] D17201: [SCEV] Introduce a guarded backedge taken count and use it in LAA and LV

Mon Mar 14 05:17:38 PDT 2016

sbaranga added a comment.

Hi Sanjoy,

Thanks for the comments! Some replies inline.

Regarding testing getAsAddRec, we'll most likely have to do it in a pass like LoopAccessAnalysis, since the call takes a Loop parameter. We already do something similar in test/Analysis/LoopAccessAnalysis/wrapping-pointer-versioning.ll, although we just check the resulting predicates there (it would be nice to also get the initial and final expressions).

Thanks,
Silviu


================
Comment at: include/llvm/Analysis/ScalarEvolution.h:557
@@ -545,1 +556,3 @@
+      /// otherwise it's all CouldNotCompute.
+      SCEVUnionPredicate Pred;
 
----------------
sanjoy wrote:
> Now that you're embedding `SCEVUnionPredicate` in every
> `ExitNotTakenInfo`, I'm somewhat worried about the `SmallVector<const
> SCEVPredicate *, 16> Preds;` :)
> 
> Firstly, what is the typical number of predicates we see?  16 seems
> higher that what I would expect.
> 
> Secondly, a better (in terms of memory consumption) solution would be
> to have a `std::unique_ptr<SCEVUnionPredicate>` here, where a null
> value means "always true" so that in the non-predicated case we only
> use up one word.
> 
> The most compact solution (I think we should go with this) is to:
> 
>  - Extract out a `struct ExitNotTakenExtras` that contains a
>    `SmallVector` of `ExitNotTaken` s and a `SCEVUnionPredicate`
> 
>  - Change `PointerIntPair<ExitNotTakenInfo*, 1> NextExit` to
>    `PointerIntPair<ExitNotTakenExtras *, 1> NextExit` where it is
>    usually null, but if it is non-null you either have multiple exits
>    or have some SCEV predicates.
> 
> Normally I wouldn't worry so much about space, but `ExitNotTakenInfo`
> has clearly been optimized for space usage, so we should try not to
> break that.
Ok, the most common size of the SCEVUnionPredicate there should be 0, so going with your solution makes sense to me.

On the other hand, it does seem a bit strange to optimize this for memory consumption. I wouldn't expect it to make that much of a difference.

================
Comment at: lib/Analysis/ScalarEvolution.cpp:5488
@@ -5436,1 +5487,3 @@
       ExitCounts.push_back({ExitBB, EL.Exact});
+      ExitCountPreds.push_back(&EL.Pred);
+    }
----------------
sanjoy wrote:
> This does not look right to me -- won't the `EL` be freed after each iteration, leaving a dangling pointer in `ExitCountPreds`?
Of course, thanks for catching this!

================
Comment at: test/Transforms/LoopVectorize/X86/vectorization-remarks-missed.ll:54
@@ -53,3 +53,3 @@
 for.body:                                         ; preds = %entry, %for.body
   %indvars.iv = phi i64 [ %indvars.iv.next, %for.body ], [ 0, %entry ]
   %arrayidx = getelementptr inbounds i32, i32* %A, i64 %indvars.iv, !dbg !16
----------------
sanjoy wrote:
> sbaranga wrote:
> > I had to change this test because it started to figure out what the backedge taken count was.
> > 
> > This was testing the vectorization remark when we cannot find the backedge taken count. changed the test so that it will continue not be able to get the backedge taken count.
> > 
> > What is interesting about this is the way we've managed to get the (exact) backedge taken count. The initial analysis was only able to get the maximum backedge taken count but not the exact one. However, we can use this to get a better SCEV for cmp3.
> > 
> > On a following invocation of computeBackedgeTakenCount this information is used to get an exact backedge taken count.
> Interesting.  Do you mean "However, we can use this to get a better
> SCEV for `%0`"?  I can see how `SimplifyICmpOperands` would be able to
> use a tighter range on `%0` to simplify the `sle` into an `slt`.
> 
> It would be great if SCEV could directly compute the backedge count
> here though.  The problem is that we didn't have a way for
> `computeExitLimitFromCond` to say "BECount is Infinite if `L` is
> `INT_MAX` else is `L + 1` (say)" so `computeExitLimitFromCond` would
> have to give up in the face of the dreaded `COULDNOTCOMPUTE`.  But
> with your work, that is changing (ability to represent predicated BE
> counts); and perhaps one day SCEV will be able to directly compute the
> backedge taken counts of loop like these. :)
> 
Yes, that is basically what is happening (we can now use the range information on %0).

You're correct, with these changes we will have a way of doing the sle/ule to slt/ult conversions even without the appropriate range information. In fact that would be something really nice to have (I've even seen people hitting this issue on llvm-dev, so maybe this is not just a corner case).


http://reviews.llvm.org/D17201