[llvm-branch-commits] [llvm] [LV] Mask off possibly aliasing vector lanes (PR #100579)

Thu Nov 20 06:56:10 PST 2025

================
@@ -8974,11 +8982,104 @@ void LoopVectorizationPlanner::attachRuntimeChecks(
     assert((!CM.OptForSize ||
             CM.Hints->getForce() == LoopVectorizeHints::FK_Enabled) &&
            "Cannot SCEV check stride or overflow when optimizing for size");
-    VPlanTransforms::attachCheckBlock(Plan, SCEVCheckCond, SCEVCheckBlock,
+    VPlanTransforms::attachCheckBlock(Plan, Plan.getOrAddLiveIn(SCEVCheckCond),
+                                      Plan.createVPIRBasicBlock(SCEVCheckBlock),
                                       HasBranchWeights);
   }
   const auto &[MemCheckCond, MemCheckBlock] = RTChecks.getMemRuntimeChecks();
   if (MemCheckBlock && MemCheckBlock->hasNPredecessors(0)) {
+    VPValue *MemCheckCondVPV = Plan.getOrAddLiveIn(MemCheckCond);
+    VPBasicBlock *MemCheckBlockVP = Plan.createVPIRBasicBlock(MemCheckBlock);
+    std::optional<ArrayRef<PointerDiffInfo>> ChecksOpt =
+        CM.Legal->getRuntimePointerChecking()->getDiffChecks();
+
+    // Create a mask enabling safe elements for each iteration.
+    if (CM.getRTCheckStyle(TTI) == RTCheckStyle::UseSafeEltsMask &&
+        ChecksOpt.has_value() && ChecksOpt->size() > 0) {
+      ArrayRef<PointerDiffInfo> Checks = *ChecksOpt;
+      VPRegionBlock *LoopRegion = Plan.getVectorLoopRegion();
+      VPBasicBlock *LoopBody = LoopRegion->getEntryBasicBlock();
+      VPBuilder Builder(MemCheckBlockVP);
+
+      /// Create a mask for each possibly-aliasing pointer pair, ANDing them if
+      /// there's more than one pair.
+      VPValue *AliasMask = nullptr;
+      for (PointerDiffInfo Check : Checks) {
+        VPValue *Sink =
+            vputils::getOrCreateVPValueForSCEVExpr(Plan, Check.SinkStart);
+        VPValue *Src =
+            vputils::getOrCreateVPValueForSCEVExpr(Plan, Check.SrcStart);
+
+        Type *PtrType = PointerType::getUnqual(Plan.getContext());
+        Sink = Builder.createScalarCast(Instruction::CastOps::IntToPtr, Sink,
+                                        PtrType, DebugLoc());
+        Src = Builder.createScalarCast(Instruction::CastOps::IntToPtr, Src,
+                                       PtrType, DebugLoc());
+
+        SmallVector<VPValue *, 3> Ops{
+            Src, Sink,
+            Plan.getConstantInt(IntegerType::getInt64Ty(Plan.getContext()),
+                                Check.AccessSize)};
+        VPWidenIntrinsicRecipe *M = new VPWidenIntrinsicRecipe(
+            Check.WriteAfterRead ? Intrinsic::loop_dependence_war_mask
+                                 : Intrinsic::loop_dependence_raw_mask,
+            Ops, IntegerType::getInt1Ty(Plan.getContext()));
+        MemCheckBlockVP->appendRecipe(M);
+        if (AliasMask)
+          AliasMask = Builder.createAnd(AliasMask, M);
+        else
+          AliasMask = M;
+      }
+      assert(AliasMask && "Expected an alias mask to have been created");
+
+      // Replace uses of the loop body's active lane mask phi with an AND of the
+      // phi and the alias mask.
+      for (VPRecipeBase &R : *LoopBody) {
+        auto *MaskPhi = dyn_cast<VPActiveLaneMaskPHIRecipe>(&R);
----------------
sdesmalen-arm wrote:

I believe the transform is currently incorrect. When there is no active lane mask, it would create an unpredicated vector loop that handles e.g. VF=16 lanes in a loop, even when the result of the alias.mask would say that only 3 lanes could be safely handled, for example. It would then increment the IV by 3 elements, but that doesn't mean only 3 lanes are handled each iteration. Without predication, it still handles 16 lanes each iteration.

I think there are two options here:
1) if there is no active lane mask in the loop, we could bail out to the scalar loop if the number of lanes < VF
2) request the use of an active lane mask in the loop for data when there is an alias mask required and the target supports the use of an active lane mask.

I wouldn't mind taking approach 1 first, so that we can already use the whilerw instructions for the alias checks in the check block, rather than a bunch of scalar instructions, and then follow this up by option 2.

https://github.com/llvm/llvm-project/pull/100579