[llvm] [LV] Convert gather loads with invariant stride into strided loads (PR #147297)

Thu Oct 23 02:24:00 PDT 2025

================
@@ -3307,6 +3332,52 @@ struct VPWidenLoadEVLRecipe final : public VPWidenMemoryRecipe, public VPValue {
   }
 };
 
+/// A recipe for strided load operations, using the base address, stride, and an
+/// optional mask. This recipe will generate an vp.strided.load intrinsic call
+/// to represent memory accesses with a fixed stride.
+struct VPWidenStridedLoadRecipe final : public VPWidenMemoryRecipe,
+                                        public VPValue {
+  VPWidenStridedLoadRecipe(LoadInst &Load, VPValue *Addr, VPValue *Stride,
----------------
Mel-Chen wrote:

> Can we store the required info independent of the LLVM IR instruction?
> 
Maybe, if we also store the type information in the recipe. But I want to know, in what scenario would we create a VPWidenStridedLoadRecipe without a LoadInst?

> It would also help to clarify what information is actually used and why this cannot be simply a VPWidenIntrinsicsRecipe, as it would map 1-1 to an intrinsic

I tried to do that: https://github.com/Mel-Chen/llvm-project/commit/d19fe61cdf5fca6168bf5048c5563a5ec62c4912
This approach does help eliminate one recipe VPWidenStridedLoadRecipe, but there are still a few details we need to be careful about:
First, the current VPWidenIntrinsicRecipe cannot set attributes, so alignment information will be lost. This could be resolved by passing an AttributeList to VPWidenIntrinsicRecipe, allowing it to add the attributes during ::execute.
```
      // ???: How to set alignment?
      auto *StridedLoad = new VPWidenIntrinsicRecipe(
          Ingredient, Intrinsic::experimental_vp_strided_load,
          {NewPtr, StrideInBytes, Mask, I32VF},
          TypeInfo.inferScalarType(LoadR), LoadR->getDebugLoc());
```
Next, it might become difficult to ensure profitability before generating the strided access (i.e., we may not be able to achieve the change suggested in [this comment](https://github.com/llvm/llvm-project/pull/147297#discussion_r2394050261)
). For more accurate profitability analysis, it would be better to call getCostForIntrinsics directly during the profitability check, which requires that all operands are already prepared.
```
        // Better to make getCostForIntrinsics to utils function, and directly
        // call getCostForIntrinsics to get the cost.
        SmallVector<Type *, 4> ParamTys = {
            TypeInfo.inferScalarType(BasePtr),
            TypeInfo.inferScalarType(StrideInElement),
            Type::getInt1Ty(Plan.getContext()),
            Type::getInt32Ty(Plan.getContext())};
        // FIXME: Copy from getCostForIntrinsics, but I think this is a bug. We
        // don't have to vectorize every operands. Should be fixed in
        // getCostForIntrinsics.
        for (auto &Ty : ParamTys)
          Ty = toVectorTy(Ty, VF);
        IntrinsicCostAttributes CostAttrs(
            Intrinsic::experimental_vp_strided_load, DataTy, {}, ParamTys,
            FastMathFlags(), nullptr, InstructionCost::getInvalid(), &Ctx.TLI);
        const InstructionCost StridedLoadStoreCost =
            Ctx.TTI.getIntrinsicInstrCost(CostAttrs, Ctx.CostKind);
        return StridedLoadStoreCost < CurrentCost;
```
Finally, using VPWidenIntrinsicRecipe reduces readability during transformations. Currently, VPWidenStridedLoadRecipe provides members like getAddr() and getStride(), which improves readability. This issue is not limited to EVL lowering—it should also have an impact #164205.
```
      .Case<VPWidenIntrinsicRecipe>(
          [&](VPWidenIntrinsicRecipe *WI) -> VPRecipeBase * {
            if (WI->getVectorIntrinsicID() ==
                Intrinsic::experimental_vp_strided_load) {
              VPWidenIntrinsicRecipe *NewWI = WI->clone();
              if (VPValue *NewMask = GetNewMask(WI->getOperand(2)))
                NewWI->setOperand(2, NewMask);
              else
                NewWI->setOperand(2, &AllOneMask);
              NewWI->setOperand(3, &EVL);
              return NewWI;
            }
            return nullptr;
          })
```
That’s my take on this. What’s your opinion?

https://github.com/llvm/llvm-project/pull/147297