[PATCH] D99750: [LV, VP]VP intrinsics support for the Loop Vectorizer

Fri Aug 4 03:35:51 PDT 2023

fhahn added a comment.

Thanks for the update! It looks like there's a remaining unit test failure (`VPVerifierTest.VPInstructionUseBeforeDefSameBB`). Some more comments in-line.

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:9858
+      // not nullptr it also implies preference for predicated vectorization.
+      Value *EVLPart = State.EVL ? State.get(State.EVL, Part) : nullptr;
       if (CreateGatherScatter) {
----------------
Can this be sunk inside the ` else {`?

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:9931
+        Value *BlockInMaskPart =
+            isMaskRequired ? MaskValue(Part, DataTy->getElementCount())
+                           : Builder.getTrueVector(DataTy->getElementCount());
----------------
Can the `maskRequired` check be sunk into `MaskValue` or even better be taken care on recipe construction?

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:9886
+
+          NewSI = Builder.CreateCall(
+              VPIntr, {StoredVal, VecPtr, BlockInMaskPart, EVLPart});
----------------
craig.topper wrote:
> Should set the alignment attribute on the pointer operand.
> 
> Something like
> ```
> NewSI->addParamAttr(1, Attribute::getWithAlignment(NewSI->getContext(), Alignment));
> ```
Is there a reason for creating the VP intrinsic call explicitly rather than using the VP builder?

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:9845
+    if (auto *IMask = dyn_cast<VPInstruction>(Mask))
+      if (IMask->getOpcode() == VPInstruction::ICmpULE)
+        return Builder.getTrueVector(EC);
----------------
ABataev wrote:
> fhahn wrote:
> > It's not clear to me what the reasoning is here to simplify this to the all-true mask without checking the operands of the compare. If the compare can be simplified to the all-true mask, then this would be suitable to do as VPlan-to-VPlan simplification instead of during codegen.
> Added extra check.
Thanks for the update. Looking at this again, it seems like it would be better to either perform this simplification when creating the tail-folding mask or optimize as VPlan-to-VPlan transform, as presumably this applies to all users of the top-level mask? Might be good to tie in with D157037 which would always create the top-level mask in the beginning when tail-folding.

================
Comment at: llvm/lib/Transforms/Vectorize/VPlan.h:245
+  // If EVL is not nullptr, then EVL must be a valid value set during plan
+  // creation, possibly default value = whole vector register length. EVL is
+  // created only if TTI prefers predicated vectorization, thus if EVL is
----------------
nit: `possibly a default value `? Turn into a doc comment `///`

================
Comment at: llvm/lib/Transforms/Vectorize/VPlan.h:2059

+/// A recipe to generate Explicit Vector Length (EVL) value to be used with
+/// VPred intrinsics.
----------------
nit: A recipe to generate `an `Explicit...

================
Comment at: llvm/lib/Transforms/Vectorize/VPlan.h:2060
+/// A recipe to generate Explicit Vector Length (EVL) value to be used with
+/// VPred intrinsics.
+/// The following three ways to compute the EVL parameter for the VP Intrinsics.
----------------
nit: might be good to use `vector predication` instead of `VPred`.

================
Comment at: llvm/lib/Transforms/Vectorize/VPlan.h:2075
+class VPEVLRecipe : public VPRecipeBase, public VPValue {
+public:
+  VPEVLRecipe(VPValue *IV, VPValue *TC)
----------------
nit: turn into struct if only public members?

================
Comment at: llvm/lib/Transforms/Vectorize/VPlan.h:2065
+/// for current tail-folding implementation.
+/// 2. The second way is to insert instructions to compute min(VF, trip_count -
+/// index) for each vector iteration.
----------------
ABataev wrote:
> fhahn wrote:
> > ABataev wrote:
> > > fhahn wrote:
> > > > IIUC this is what the recipe actually computes at the moment, right?
> > > No, currently it emits number 3. Other are not fully supported, they will require some extra work.
> > Ah I see, would be good to update
> Will update the comment
Thanks for the update! IIUC options 1) and 2) don't need to be handled by `VPEVLRecipe`, 1) would just be a live-in VPValue and 2) could be a set VPInsructions to compute it. Would it make sense to only focus on 3) for EVL recipe.

It would be good to mention that the recipe effectively sets the EVL for the remainder of the region.

================
Comment at: llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp:292
+    Value *EVL = HasEVL ? State.get(getOperand(1), Part) : nullptr;
+    if (Part == 0 || EVL) {
       bool IsNUW = getOpcode() == VPInstruction::CanonicalIVIncrementNUW;
----------------
IIUC the `EVL` check here is not needed? Can have an asset that `EVL` implies `Part == 0`. IIUC Part > 0 could be a problem is that there could be different EVL's for different parts?

================
Comment at: llvm/lib/Transforms/Vectorize/VPlanVerifier.cpp:205

+  // Check if VPEVLRecipe exists only in Entry block an only once. 
+  bool VPEVLFound = false;
----------------
nit: Check if VPEVLRecipe  exists only in Entry bloc `and` only once?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D99750/new/

https://reviews.llvm.org/D99750