[llvm] [LV] Convert gather loads with invariant stride into strided loads (PR #147297)
Mel Chen via llvm-commits
llvm-commits at lists.llvm.org
Thu Oct 23 02:24:00 PDT 2025
================
@@ -3307,6 +3332,52 @@ struct VPWidenLoadEVLRecipe final : public VPWidenMemoryRecipe, public VPValue {
}
};
+/// A recipe for strided load operations, using the base address, stride, and an
+/// optional mask. This recipe will generate an vp.strided.load intrinsic call
+/// to represent memory accesses with a fixed stride.
+struct VPWidenStridedLoadRecipe final : public VPWidenMemoryRecipe,
+ public VPValue {
+ VPWidenStridedLoadRecipe(LoadInst &Load, VPValue *Addr, VPValue *Stride,
----------------
Mel-Chen wrote:
> Can we store the required info independent of the LLVM IR instruction?
>
Maybe, if we also store the type information in the recipe. But I want to know, in what scenario would we create a VPWidenStridedLoadRecipe without a LoadInst?
> It would also help to clarify what information is actually used and why this cannot be simply a VPWidenIntrinsicsRecipe, as it would map 1-1 to an intrinsic
I tried to do that: https://github.com/Mel-Chen/llvm-project/commit/d19fe61cdf5fca6168bf5048c5563a5ec62c4912
This approach does help eliminate one recipe VPWidenStridedLoadRecipe, but there are still a few details we need to be careful about:
First, the current VPWidenIntrinsicRecipe cannot set attributes, so alignment information will be lost. This could be resolved by passing an AttributeList to VPWidenIntrinsicRecipe, allowing it to add the attributes during ::execute.
```
// ???: How to set alignment?
auto *StridedLoad = new VPWidenIntrinsicRecipe(
Ingredient, Intrinsic::experimental_vp_strided_load,
{NewPtr, StrideInBytes, Mask, I32VF},
TypeInfo.inferScalarType(LoadR), LoadR->getDebugLoc());
```
Next, it might become difficult to ensure profitability before generating the strided access (i.e., we may not be able to achieve the change suggested in [this comment](https://github.com/llvm/llvm-project/pull/147297#discussion_r2394050261)
). For more accurate profitability analysis, it would be better to call getCostForIntrinsics directly during the profitability check, which requires that all operands are already prepared.
```
// Better to make getCostForIntrinsics to utils function, and directly
// call getCostForIntrinsics to get the cost.
SmallVector<Type *, 4> ParamTys = {
TypeInfo.inferScalarType(BasePtr),
TypeInfo.inferScalarType(StrideInElement),
Type::getInt1Ty(Plan.getContext()),
Type::getInt32Ty(Plan.getContext())};
// FIXME: Copy from getCostForIntrinsics, but I think this is a bug. We
// don't have to vectorize every operands. Should be fixed in
// getCostForIntrinsics.
for (auto &Ty : ParamTys)
Ty = toVectorTy(Ty, VF);
IntrinsicCostAttributes CostAttrs(
Intrinsic::experimental_vp_strided_load, DataTy, {}, ParamTys,
FastMathFlags(), nullptr, InstructionCost::getInvalid(), &Ctx.TLI);
const InstructionCost StridedLoadStoreCost =
Ctx.TTI.getIntrinsicInstrCost(CostAttrs, Ctx.CostKind);
return StridedLoadStoreCost < CurrentCost;
```
Finally, using VPWidenIntrinsicRecipe reduces readability during transformations. Currently, VPWidenStridedLoadRecipe provides members like getAddr() and getStride(), which improves readability. This issue is not limited to EVL lowering—it should also have an impact #164205.
```
.Case<VPWidenIntrinsicRecipe>(
[&](VPWidenIntrinsicRecipe *WI) -> VPRecipeBase * {
if (WI->getVectorIntrinsicID() ==
Intrinsic::experimental_vp_strided_load) {
VPWidenIntrinsicRecipe *NewWI = WI->clone();
if (VPValue *NewMask = GetNewMask(WI->getOperand(2)))
NewWI->setOperand(2, NewMask);
else
NewWI->setOperand(2, &AllOneMask);
NewWI->setOperand(3, &EVL);
return NewWI;
}
return nullptr;
})
```
That’s my take on this. What’s your opinion?
https://github.com/llvm/llvm-project/pull/147297
More information about the llvm-commits
mailing list