[PATCH] D99750: [LV, VP] RFC: VP intrinsics support for the Loop Vectorizer (Proof-of-Concept)

Wed Jul 19 12:11:19 PDT 2023

fhahn added a comment.

Thanks for the updates! It would also be good to update the description so it reflects the current status. In terms of testing, would it be possible to have most tests target-independent (using `-prefer-predicate-with-vp-intrinsics=force-explicit-vector-length-support` ) and limit the RISCV-dependent tests to cost modeling/checking of defaults? That ensures that we have wide test coverage by default (e.g. if someone doesn't have RISCV as enabled target).

================
Comment at: llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp:196
+
+  if (DataTy->isFloatTy())
+    return ST->hasVInstructionsF32();
----------------
AFACIT this is all dead code for now?

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:1650
+    return PreferVPIntrinsics && foldTailByMasking() &&
+           Legal->getMaxSafeDepDistBytes() == -1U;
+  }
----------------
Looks like this needs a test and the TODO from the recipe should probably go here to make clear why there's this restriction for now

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:8852
+    VPEVL = new VPEVLRecipe(Plan.getCanonicalIV(), &Plan.getVectorTripCount());
+    Header->appendRecipe(VPEVL);
+  }
----------------
There should be only a single VPEVL recipe, would be good to check this in the VPlanVerifier.

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:8859
+  if (VPEVL)
+    Args.push_back(VPEVL);
+
----------------
nit: would it be possible to move the code to create the VPEVLRecipe here, reducing the scope of the `VPEVL` variable?

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:9845
+    if (auto *IMask = dyn_cast<VPInstruction>(Mask))
+      if (IMask->getOpcode() == VPInstruction::ICmpULE)
+        return Builder.getTrueVector(EC);
----------------
It's not clear to me what the reasoning is here to simplify this to the all-true mask without checking the operands of the compare. If the compare can be simplified to the all-true mask, then this would be suitable to do as VPlan-to-VPlan simplification instead of during codegen.

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:9880
+        // intrinsic.
+        if (EVLPart) {
+          auto *StoredValTy = cast<VectorType>(StoredVal->getType());
----------------
For correctness, would we also need special handling for interleave groups?

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:9882
+          auto *StoredValTy = cast<VectorType>(StoredVal->getType());
+          BasicBlock *VectorPH = State.CFG.getPreheaderBBFor(this);
+          Function *VPIntr = Intrinsic::getDeclaration(
----------------
A more direct why that doesn't require accessing the preheader would be to use something like `State.Builder.GetInsertBlock()->getModule();`

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:9071
+        assert(isa<TruncInst>(Instr) ||
+               isa<VPEVLRecipe>(HeaderVPBB->getFirstNonPhi()));
         Recipe->insertBefore(*HeaderVPBB, HeaderVPBB->getFirstNonPhi());
----------------
ABataev wrote:
> fhahn wrote:
> > Is this still needed? The latest version only introduces `VPEVLRecipe` in `addCanonicalIVRecipes`.
> Still needed. VPWidenIntOrFpInductionRecipe is emitted after addCanonicalIVRecipes and needs to be inserted as a PHI recipe. I'm not sure about TODO, I think it should be removed. But we still need to insert VPWidenIntOrFpInductionRecipe before VPEVLRecipe, which is emitted immediately after canonical IV recipe.
I think the TODO is not really needed. As the EVL recipe needs to be created before other recipe construction, there's not much we can do about it.

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:2886
+    Value *TC = getTripCount();
+    // Loop has multiple exits. Make sure scalar remainder executes at least 1
+    // scalar iteration to perform correct jump.
----------------
ABataev wrote:
> fhahn wrote:
> > Is there a test with multiple exits?
> It currently not supported by this patch. This is just an inital patch, it won't handle all the corner cases.
Reading this back now, the check below is for loops requiring scalar epilogues, not necessarily multiple exits. Would be good to update the comment and also add a test to make sure this is captured. AFAICT this is something that the patch already handles and should be tested (e.g. a test with an interleave group that requires scalar epilogue)

================
Comment at: llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp:295
       auto *Phi = State.get(getOperand(0), 0);
       // The loop step is equal to the vectorization factor (num of SIMD
       // elements) times the unroll factor (num of SIMD instructions).
----------------
this comment is out of place now?

================
Comment at: llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp:298
+      if (HasEVL) {
+         EVL = Builder.CreateIntCast(EVL, Phi->getType(), /*isSigned=*/false);
+         if (State.UF > 1)
----------------
This always needs to be a zero-extend or no-op, right? If it is a truncate that may be incorrect, right? Would be good to add an assert if possible.

================
Comment at: llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp:299
+         EVL = Builder.CreateIntCast(EVL, Phi->getType(), /*isSigned=*/false);
+         if (State.UF > 1)
+           EVL = Builder.CreateMul(EVL,
----------------
Is State.UF > 1 possible at the moment? It looks like there's a bailout for UserIC and UF/IC=1 is selected if vector predication is enabled.

================
Comment at: llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp:1159
+  printAsOperand(O, SlotTracker);
+  O << " = EXPLICIT-VECTOR-LENGTH";
+}
----------------
also needs to print the operands ,  ` printOperands(O, SlotTracker);` should do

================
Comment at: llvm/test/Transforms/LoopVectorize/RISCV/vectorize-vp-intrinsics.ll:446
+entry:
+  %cmp10 = icmp sgt i32 %N, 0
+  br i1 %cmp10, label %for.body.preheader, label %for.cond.cleanup
----------------
nit: not needed? . (same for other tests)

================
Comment at: llvm/test/Transforms/LoopVectorize/RISCV/vectorize-vp-intrinsics.ll:450
+for.body.preheader:
+  %wide.trip.count = zext i32 %N to i64
+  br label %for.body
----------------
nit: not needed, could pass %N as i64. (same for other tests)

================
Comment at: llvm/test/Transforms/LoopVectorize/RISCV/vectorize-vp-intrinsics.ll:453
+
+for.cond.cleanup.loopexit:
+  br label %for.cond.cleanup
----------------
unnecessary block

================
Comment at: llvm/test/Transforms/LoopVectorize/RISCV/vectorize-vp-intrinsics.ll:460
+for.body:
+  %indvars.iv = phi i64 [ 0, %for.body.preheader ], [ %indvars.iv.next, %for.body ]
+  %arrayidx = getelementptr inbounds i32, ptr %b, i64 %indvars.iv
----------------
nit: strip unnecessary `indvars.` prefix

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D99750/new/

https://reviews.llvm.org/D99750