[PATCH] D79175: [ARM][MVE] Tail-Predication: use @llvm.get.active.lane.mask to get the BTC

Fri May 29 08:06:55 PDT 2020

SjoerdMeijer added inline comments.

================
Comment at: llvm/lib/Target/ARM/MVETailPredication.cpp:382
   // TODO: This add likely already exists in the loop.
   Value *Remaining = Builder.CreateSub(Processed, Factor);
   Processed->addIncoming(Remaining, L->getLoopLatch());
----------------
samparker wrote:
> efriedma wrote:
> > efriedma wrote:
> > > samparker wrote:
> > > > efriedma wrote:
> > > > > This subtraction can also overflow.
> > > > But that's okay, right? This predication is only really useful when wrapping happens and the intrinsic reflects that overflow can/will happen.
> > > The problem here is that the original get_active_lane_mask can do something like this:
> > > 
> > > Iteration 1: get_active_lane_mask(0, 5) -> all-true
> > > Iteration 2: get_active_lane_mask(4, 5) -> <true, false, false, false>
> > > Iteration 3: get_active_lane_mask(8, 5) -> all-false
> > > 
> > > In the rewritten code, you end up with this:
> > > 
> > > Iteration 1: vctp(5) -> all-true
> > > Iteration 2: vctp(1)- > <true, false, false, false>
> > > Iteration 3: vctp(-3)- > all-true
> > > 
> > > There are a couple ways you could deal with this:
> > > 
> > > 1. Use saturating subtraction
> > > 2. Prove the trip count is small enough that this never happens, i.e. `ceil(ElementCount / VectorWidth) >= TripCount`.
> > Using saturating subtraction would require proving that the induction variable used as the first argument to llvm.get.active.mask doesn't overflow.  But I guess you have to check that anyway.
> I would be hesitate to introduce saturating math here. I think the reasonable assumption is that the sub will wrap, but only on the final iteration. So just asserting that the element count is within the bounds of the trip count should be fine. 
> The problem here is that the original get_active_lane_mask can do something like this:
>
> Iteration 1: get_active_lane_mask(0, 5) -> all-true
> Iteration 2: get_active_lane_mask(4, 5) -> <true, false, false, false>
> Iteration 3: get_active_lane_mask(8, 5) -> all-false

In general, I see the problem, and I see that this can happen. But here in this context, I was wondering if it not actually boils down to Sam's earlier remark about countable loops. That is, before we were pattern matching a particular pattern produced by the vectoriser, we are handling (vector) loops produced by the vectoriser. Thus, we will never execute Iteration #3 from the example above, because `ceil(ElementCount / VectorWidth) >= TripCount` will always hold for these loops.

My question is, with the intrinsic approach, can we still rely on that, would that be a valid assumption to make? Along these same lines, this pass also relies on a check `IsPredicatedVectorLoop` and presence of masked loads/stores currently produced by the vectoriser, which gets its predicate from @llvm.get.active.lane.mask also generated by the vectoriser.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D79175/new/

https://reviews.llvm.org/D79175