[PATCH] D125301: [LoopVectorize] Add option to use active lane mask for loop control flow

Mon Jun 13 01:12:41 PDT 2022

david-arm added inline comments.

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:8712
+
+    auto *ALM = new VPInstruction(VPInstruction::ActiveLaneMask,
+                                  {CanonicalIVIncrementParts, TC}, DL);
----------------
fhahn wrote:
> david-arm wrote:
> > fhahn wrote:
> > > It looks like the update for the phi doesn't depend on the phi, but only on trip count & main induction. 
> > > 
> > > I might be missing something, but is the phi actually needed or would it be possible to compute the active lane mask at the beginning of each iteration instead of at the end of the iteration?
> > Hi @fhahn, perhaps I've misunderstood your question here, but actually the point of this patch is to do the exact opposite of that, i.e. we *don't* want to generate the active lane mask for the current iteration - we want to create the mask for the next iteration. This requires a PHI to carry the live value around the loop and is the only way to use the mask for control flow because at the point of branching we want to know if there are any active lanes in the mask for the *next* iteration.
> > 
> > With particular reference to SVE, the motivation for this work is to use the 'whilelo' instruction to both generate the lane mask and set the flags to branch on. Effectively, the whilelo instruction is doing the comparison already, which makes the traditional scalar IV comparison redundant.
> > 
> > I could be wrong, but I believe this form of vectorised loop would be beneficial for some other targets with a predicated instruction set too, such as RISC-V.
> > With particular reference to SVE, the motivation for this work is to use the 'whilelo' instruction to both generate the lane mask and set the flags to branch on. 
> 
> Oh right, I missed this in the test case I was looking at! I originally thought the intention of the new phi recipe was to encode some extra guarantees/information like we do for inductions or reductions. 
> 
> From the latest comment, it sounds like there would be no need to have a new recipe class for the phi, but perhaps VPWidenPHIRecipe could be used instead, if the setup can be moved to the pre-header.
So I actually tried doing exactly this initially, but I think that VPWidenPHIRecipe requires an underlying scalar instruction to widen due to the execute function that lives in LoopVectorize.cpp:

  void VPWidenPHIRecipe::execute(VPTransformState &State) {
    State.ILV->widenPHIInstruction(cast<PHINode>(getUnderlyingValue()), this,
                                   State);
  }

but the PHI node does not exist in the original scalar loop. It's not currently possible to widen a PHI that didn't previously exist, which means I would have to modify VPWidenPHIRecipe to test for the existence of an underlying value and take different paths accordingly.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D125301/new/

https://reviews.llvm.org/D125301