[llvm] [SLP]Add support for strided loads. (PR #80310)
Alexey Bataev via llvm-commits
llvm-commits at lists.llvm.org
Thu Feb 8 13:10:26 PST 2024
================
@@ -3930,30 +3943,71 @@ static LoadsState canVectorizeLoads(ArrayRef<Value *> VL, const Value *VL0,
std::optional<int> Diff =
getPointersDiff(ScalarTy, Ptr0, ScalarTy, PtrN, DL, SE);
// Check that the sorted loads are consecutive.
- if (static_cast<unsigned>(*Diff) == VL.size() - 1)
+ if (static_cast<unsigned>(*Diff) == Sz - 1)
return LoadsState::Vectorize;
// Simple check if not a strided access - clear order.
- IsPossibleStrided = *Diff % (VL.size() - 1) == 0;
+ bool IsPossibleStrided = *Diff % (Sz - 1) == 0;
+ // Try to generate strided load node if:
+ // 1. Target with strided load support is detected.
+ // 2. The number of loads is greater than MinProfitableStridedLoads,
+ // or the potential stride <= MaxProfitableLoadStride and the
+ // potential stride is power-of-2 (to avoid perf regressions for the very
+ // small number of loads) and max distance > number of loads, or potential
+ // stride is -1.
+ // 3. The loads are ordered, or number of unordered loads <=
+ // MaxProfitableUnorderedLoads, or loads are in reversed order.
+ // (this check is to avoid extra costs for very expensive shuffles).
+ if (IsPossibleStrided && (((Sz > MinProfitableStridedLoads ||
+ (static_cast<unsigned>(std::abs(*Diff)) <=
+ MaxProfitableLoadStride * Sz &&
+ isPowerOf2_32(std::abs(*Diff)))) &&
+ static_cast<unsigned>(std::abs(*Diff)) > Sz) ||
+ *Diff == -(static_cast<int>(Sz) - 1))) {
+ int Stride = *Diff / static_cast<int>(Sz - 1);
+ if (*Diff == Stride * static_cast<int>(Sz - 1)) {
+ Align Alignment =
+ cast<LoadInst>(Order.empty() ? VL.front() : VL[Order.front()])
+ ->getAlign();
+ if (TTI.isLegalStridedLoadStore(VecTy, Alignment)) {
+ // Iterate through all pointers and check if all distances are
+ // unique multiple of Dist.
+ SmallSet<int, 4> Dists;
+ for (Value *Ptr : PointerOps) {
+ int Dist = 0;
+ if (Ptr == PtrN)
+ Dist = *Diff;
+ else if (Ptr != Ptr0)
+ Dist = *getPointersDiff(ScalarTy, Ptr0, ScalarTy, Ptr, DL, SE);
+ // If the strides are not the same or repeated, we can't
+ // vectorize.
+ if (((Dist / Stride) * Stride) != Dist ||
+ !Dists.insert(Dist).second)
+ break;
+ }
----------------
alexey-bataev wrote:
Currently it will emit strided load + shuffle. It may also emit the shuffle + store, if the graph rotation shows that it is profitable. Both scenarios are possible, SLP will try to minimize number of shuffles anyway
https://github.com/llvm/llvm-project/pull/80310
More information about the llvm-commits
mailing list