[PATCH] D79175: [ARM][MVE] Tail-Predication: use @llvm.get.active.lane.mask to get the BTC
    Eli Friedman via Phabricator via llvm-commits 
    llvm-commits at lists.llvm.org
       
    Tue Jun  2 17:35:14 PDT 2020
    
    
  
efriedma added inline comments.
================
Comment at: llvm/lib/Target/ARM/MVETailPredication.cpp:382
   // TODO: This add likely already exists in the loop.
   Value *Remaining = Builder.CreateSub(Processed, Factor);
   Processed->addIncoming(Remaining, L->getLoopLatch());
----------------
SjoerdMeijer wrote:
> efriedma wrote:
> > SjoerdMeijer wrote:
> > > samparker wrote:
> > > > efriedma wrote:
> > > > > efriedma wrote:
> > > > > > samparker wrote:
> > > > > > > efriedma wrote:
> > > > > > > > This subtraction can also overflow.
> > > > > > > But that's okay, right? This predication is only really useful when wrapping happens and the intrinsic reflects that overflow can/will happen.
> > > > > > The problem here is that the original get_active_lane_mask can do something like this:
> > > > > > 
> > > > > > Iteration 1: get_active_lane_mask(0, 5) -> all-true
> > > > > > Iteration 2: get_active_lane_mask(4, 5) -> <true, false, false, false>
> > > > > > Iteration 3: get_active_lane_mask(8, 5) -> all-false
> > > > > > 
> > > > > > In the rewritten code, you end up with this:
> > > > > > 
> > > > > > Iteration 1: vctp(5) -> all-true
> > > > > > Iteration 2: vctp(1)- > <true, false, false, false>
> > > > > > Iteration 3: vctp(-3)- > all-true
> > > > > > 
> > > > > > There are a couple ways you could deal with this:
> > > > > > 
> > > > > > 1. Use saturating subtraction
> > > > > > 2. Prove the trip count is small enough that this never happens, i.e. `ceil(ElementCount / VectorWidth) >= TripCount`.
> > > > > Using saturating subtraction would require proving that the induction variable used as the first argument to llvm.get.active.mask doesn't overflow.  But I guess you have to check that anyway.
> > > > I would be hesitate to introduce saturating math here. I think the reasonable assumption is that the sub will wrap, but only on the final iteration. So just asserting that the element count is within the bounds of the trip count should be fine. 
> > > > The problem here is that the original get_active_lane_mask can do something like this:
> > > >
> > > > Iteration 1: get_active_lane_mask(0, 5) -> all-true
> > > > Iteration 2: get_active_lane_mask(4, 5) -> <true, false, false, false>
> > > > Iteration 3: get_active_lane_mask(8, 5) -> all-false
> > > 
> > > In general, I see the problem, and I see that this can happen. But here in this context, I was wondering if it not actually boils down to Sam's earlier remark about countable loops. That is, before we were pattern matching a particular pattern produced by the vectoriser, we are handling (vector) loops produced by the vectoriser. Thus, we will never execute Iteration #3 from the example above, because `ceil(ElementCount / VectorWidth) >= TripCount` will always hold for these loops.
> > > 
> > > My question is, with the intrinsic approach, can we still rely on that, would that be a valid assumption to make? Along these same lines, this pass also relies on a check `IsPredicatedVectorLoop` and presence of masked loads/stores currently produced by the vectoriser, which gets its predicate from @llvm.get.active.lane.mask also generated by the vectoriser.
> > It's not safe to assume get_active_lane_mask was produced by the LLVM vectorizer.  I mean, the ACLE has a vctp intrinsic; it's not that much of a stretch that we could expose get_active_lane_mask to users at some point.
> > 
> > And even if it was produced by the LLVM vectorizer, other passes that can modify the loop structure run between the vectorizer and the MVETailPredication pass.
> > 
> > And even if you're looking at the unmodified vectorizer output, you still need to verify the "L" is actually the loop that was vectorized.
> > 
> > In summary, I don't think it's a good idea to make assumptions beyond what LangRef actually promises.  It's still a lot easier to pattern-match than it would be without the intrinsic.
> Ok, cheers, got it.
> 
> > It's still a lot easier to pattern-match than it would be without the intrinsic.
> 
> Yep, will have a go at this next week.
(It looks like the current version doesn't try to address the potential overflow in the subtraction.)
================
Comment at: llvm/lib/Target/ARM/MVETailPredication.cpp:354
+  // tripcount expression BTC + 1 is not safe.
+  if (llvm::isKnownNegativeInLoop(ScalarTripCount, L, *SE)) {
+    LLVM_DEBUG(dbgs() << "ARM TP: overflow detected in: ";
----------------
I think you want llvm::cannotBeMaxInLoop?
================
Comment at: llvm/lib/Target/ARM/MVETailPredication.cpp:378
+    return false;
+  }
+  auto StepValue = Step->getValue()->getSExtValue();
----------------
You also need to check that the AddRec is associated with the right loop.
CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D79175/new/
https://reviews.llvm.org/D79175
    
    
More information about the llvm-commits
mailing list