[PATCH] D125301: [LoopVectorize] Add option to use active lane mask for loop control flow

Tue Jun 14 04:14:53 PDT 2022

david-arm marked 6 inline comments as done.
david-arm added inline comments.

================
Comment at: llvm/lib/Transforms/Vectorize/VPlan.h:764
+    BranchOnCond,
+    BranchOnActiveLaneMask,
   };
----------------
david-arm wrote:
> fhahn wrote:
> > Can `BranchOnCond` be used instead of the dedicated `BranchOnactiveLaneMask`?
> I created this patch a month ago, which predated your BranchOnCond work. That's why I haven't used it. I can certainly look into this and see if it's possible though?
So I did look into this. In order to do it this way I have to explicitly generate the Not and ExtractElement operations using VPInstructions, which requires a new VPInstruction::ExtractElement type. It's possible to do this, but then I wasn't sure about the semantics of this new instruction. When passing in a scalar constant of 0 for the lane, it gets widened to something like <vscale x 4 x i32> zeroinitializer for every part. However, I only need a single lane so I'd have to do something like:

  case VPInstruction::ExtractElement: {
    Value *Vec = State.get(getOperand(0), Part);
    Value *Lane = State.get(getOperand(1), VPIteration(0, 0));
    Value *V = Builder.CreateExtractElement(Vec, Lane);
    State.set(this, V, Part);
    break;
  }

It feels quite inefficient to go to all the effort of widening, only to discard everything!

If you still prefer me to proceed with this approach I'm happy to try if you can provide your thoughts on what the new ExtractElement operation should look like?

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D125301/new/

https://reviews.llvm.org/D125301