[PATCH] D86074: [ARM][MVE] Tail-predication: check get.active.lane.mask's TC value

Thu Sep 17 14:26:01 PDT 2020

efriedma added inline comments.

================
Comment at: llvm/lib/Target/ARM/MVETailPredication.cpp:376

-  // 1) TODO: Check that the TripCount (TC) belongs to this loop (originally).
+  // 1) Check that the original scalar loop TripCount (TC) belongs to this loop.
   // The scalar tripcount corresponds the number of elements processed by the
----------------
SjoerdMeijer wrote:
> efriedma wrote:
> > SjoerdMeijer wrote:
> > > SjoerdMeijer wrote:
> > > > efriedma wrote:
> > > > > Why do we need this check?  Emitting vctp32 should be okay even if we can't actually tail-predicate the loop.  The overflow check should be enough to ensure that's it's safe to emit vctp32, I think?  Or am I forgetting somthing?
> > > > I could have a look where exactly, but if I am not mistaken you suggested on of the previous patches that we need to check that this tripcount/elementcount actually belongs to this loop. similarly like we already did for the IV. The reasoning was that for now get.active.lane.mask is emitted from the vector for nicely behaving loops, but it wouldn't be difficult to imagine that soon we will have a corresponding user-facing intrinsic. I think I am quoting that, if I remember well, and so these checks are needed.
> > > > 
> > > > And if we emit the VCTP, then that represents tail-predication. I.e., the VCTP intrinsic can be picked up in the LoweroverheadLoop pass and turned into a tail-predicated loop (after additional checks).
> > > I did have a look because I was curious if I had starting imaging things. This is the remark I had in mind:
> > > 
> > > https://reviews.llvm.org/D79175#2063586
> > > 
> > > This is remark is explicitly about "L" though. And I thought there was a similar remark about the 2nd argument when it still was the BTC (previous version of this patch), but I don't think I can't find that easily now.
> > Let me try to reformulate in a different way that might make it easier to understand. What you're doing here has two essential steps:
> > 
> > 1. Convert "llvm.get.active.lane.mask(X, Y)" to "llvm.arm.mve.vctp(Y - X)".
> > 2. Convert "Y - X" to a simpler induction variable.
> > 
> > In theory, you could split these steps; (1) could be legal even without (2).
> > 
> > Step 1 doesn't depend on the loop, or even that the statement is in a loop at all.  The only requirement is that the subtraction itself doesn't overflow.
> > 
> > Step 2 requires that "Y - X" is equivalent to the new induction variable: it's needs to be an AddRec for the loop you're inserting the PHI into,  and the generated PHI has to have the same base and increment.
> > 
> > Neither of these are should be directly connected to the trip count of the loop, I think.
> > 
> > The way the code is currently written, I think you're trying to prove more than you actually need to.  If the induction variable has the "wrong" base or increment, ARMLowOverheadLoops will ultimately fail to tail-predicate, but I'm not sure that's actually a problem.
> Thanks again Eli for explaining/elaborating.
> 
> Let me know what you prefer or think is best: rip this particular bit out (revert it), or leave it for the moment. I am asking because I will need some time to have a look at this:
> 
> > Step 2 requires that "Y - X" is equivalent to the new induction variable: it's needs to be an AddRec for the loop you're inserting the PHI into, and the generated PHI has to have the same base and increment.
> 
> This "Y - X" expression is a difficult one to analyse (it can be), and I need to see how to do that.
> Let me know what you prefer or think is best: rip this particular bit out (revert it), or leave it for the moment. I am asking because I will need some time to have a look at this:

Post the patches in whatever order you think makes sense for review.

> > Step 2 requires that "Y - X" is equivalent to the new induction variable: it's needs to be an AddRec for the loop you're inserting the PHI into, and the generated PHI has to have the same base and increment.
> 
> This "Y - X" expression is a difficult one to analyse (it can be), and I need to see how to do that.

Step 2 should be easy.  In the cases you're interested in, the "Y - X" SCEV expression should look something like `{ElementCount,+,-VectorWidth}`.  VectorWidth is a constant, and you don't really need to analyze ElementCount.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D86074/new/

https://reviews.llvm.org/D86074