[PATCH] D125301: [LoopVectorize] Add option to use active lane mask for loop control flow

Mon Jun 13 01:05:27 PDT 2022

fhahn added inline comments.

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:8712
+
+    auto *ALM = new VPInstruction(VPInstruction::ActiveLaneMask,
+                                  {CanonicalIVIncrementParts, TC}, DL);
----------------
david-arm wrote:
> fhahn wrote:
> > It looks like the update for the phi doesn't depend on the phi, but only on trip count & main induction. 
> > 
> > I might be missing something, but is the phi actually needed or would it be possible to compute the active lane mask at the beginning of each iteration instead of at the end of the iteration?
> Hi @fhahn, perhaps I've misunderstood your question here, but actually the point of this patch is to do the exact opposite of that, i.e. we *don't* want to generate the active lane mask for the current iteration - we want to create the mask for the next iteration. This requires a PHI to carry the live value around the loop and is the only way to use the mask for control flow because at the point of branching we want to know if there are any active lanes in the mask for the *next* iteration.
> 
> With particular reference to SVE, the motivation for this work is to use the 'whilelo' instruction to both generate the lane mask and set the flags to branch on. Effectively, the whilelo instruction is doing the comparison already, which makes the traditional scalar IV comparison redundant.
> 
> I could be wrong, but I believe this form of vectorised loop would be beneficial for some other targets with a predicated instruction set too, such as RISC-V.
> With particular reference to SVE, the motivation for this work is to use the 'whilelo' instruction to both generate the lane mask and set the flags to branch on. 

Oh right, I missed this in the test case I was looking at! I originally thought the intention of the new phi recipe was to encode some extra guarantees/information like we do for inductions or reductions. 

>From the latest comment, it sounds like there would be no need to have a new recipe class for the phi, but perhaps VPWidenPHIRecipe could be used instead, if the setup can be moved to the pre-header.

================
Comment at: llvm/lib/Transforms/Vectorize/VPlan.cpp:1568
+        createStepForVF(State.Builder, TC->getType(), State.VF, Part);
+    Instruction *StartMask = State.Builder.CreateIntrinsic(
+        Intrinsic::get_active_lane_mask, {PredTy, TC->getType()}, {StartIV, TC},
----------------
david-arm wrote:
> fhahn wrote:
> > The start value should be created in the preheader block in VPlan instead of in the phi node here. There there should also be no need for a TC argument?
> Hi @fhahn, I'm not really sure what you mean here. Do you mean that LoopVectorize.cpp should create a VPInstruction with type ActiveLaneMask that lives in the VPlan's preheader block? If so, then VPInstruction::ActiveLaneMask needs a trip count operand.
> 
> Or do you mean I should deviate from the expected recipe creation somehow in a VPlan function such as prepareToExecute by manually adding the incoming value?
> Do you mean that LoopVectorize.cpp should create a VPInstruction with type ActiveLaneMask that lives in the VPlan's preheader block?

Yes exactly.

>  If so, then VPInstruction::ActiveLaneMask needs a trip count operand.

Looking at `VPInstruction::generateInstruction`, it looks like it already takes a trip count operand? The first operand can be set to zero by wrapping a zero constant in a VPValue.

```
  case VPInstruction::ActiveLaneMask: {
    // Get first lane of vector induction variable.
    Value *VIVElem0 = State.get(getOperand(0), VPIteration(Part, 0));
    // Get the original loop tripcount.
    Value *ScalarTC = State.get(getOperand(1), Part);
```

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D125301/new/

https://reviews.llvm.org/D125301