[llvm] [LoadStoreVectorizer] Fill gaps in load/store chains to enable vectorization (PR #159388)

Wed Nov 5 08:34:53 PST 2025

================
@@ -831,7 +914,60 @@ std::vector<Chain> Vectorizer::splitChainByAlignment(Chain &C) {
         }
       }
 
-      if (!IsAllowedAndFast(Alignment)) {
+      // Attempt to extend non-power-of-2 chains to the next power of 2.
+      Chain ExtendingLoadsStores;
+      if (NumVecElems < TargetVF && NumVecElems % 2 != 0 && VecElemBits >= 8) {
+        // TargetVF may be a lot higher than NumVecElems,
+        // so only extend to the next power of 2.
+        assert(VecElemBits % 8 == 0);
+        unsigned VecElemBytes = VecElemBits / 8;
+        unsigned NewNumVecElems = PowerOf2Ceil(NumVecElems);
+        unsigned NewSizeBytes = VecElemBytes * NewNumVecElems;
+
+        assert(NewNumVecElems <= TargetVF);
+
+        LLVM_DEBUG(dbgs() << "LSV: attempting to extend chain of "
+                          << NumVecElems << " "
+                          << (IsLoadChain ? "loads" : "stores") << " to "
+                          << NewNumVecElems << " elements\n");
+        // Do not artificially increase the chain if it becomes misaligned or if
+        // the associated masked load/store is not legal, otherwise we may
+        // unnecessarily split the chain when the target actually supports
+        // non-pow2 VF.
+        if (accessIsAllowedAndFast(NewSizeBytes, AS, Alignment, VecElemBits) &&
+            (IsLoadChain ? TTI.isLegalMaskedLoad(
+                               FixedVectorType::get(VecElemTy, NewNumVecElems),
+                               Alignment, AS, TTI::MaskKind::ConstantMask)
+                         : TTI.isLegalMaskedStore(
+                               FixedVectorType::get(VecElemTy, NewNumVecElems),
+                               Alignment, AS, TTI::MaskKind::ConstantMask))) {
+          LLVM_DEBUG(dbgs()
+                     << "LSV: extending " << (IsLoadChain ? "load" : "store")
+                     << " chain of " << NumVecElems << " "
+                     << (IsLoadChain ? "loads" : "stores")
+                     << " with total byte size of " << SizeBytes << " to "
+                     << NewNumVecElems << " "
+                     << (IsLoadChain ? "loads" : "stores")
+                     << " with total byte size of " << NewSizeBytes
+                     << ", TargetVF=" << TargetVF << " \n");
+
+          // Create (NewNumVecElems - NumVecElems) extra elements.
----------------
dakersnar wrote:

Good question. The heuristic for how many elements to add while gap filling is a little shaky for a few reasons. For one, the heuristic is for a _single_ gap (a chain might have multiple gaps). We also do not know what sort of final vector we will end up with, as we have yet to analyze alignment information. I think filling 1 element no matter what and 2 elements when they complete a "set" of 4 is a reasonable heuristic to cover most practical cases where this would be useful.

In contrast, while extending, we are about to vectorize, and we know for sure that we have:
1. a chain with a non-power-of-2 number of elements, that will be split due to the alignment check in accessIsAllowedAndFast.
2. a target that supports a larger vector size.
3. an alignment that would support that larger vector size.

We are working with more concrete information at that point that if we extend the chain, we will have a legal vector, and at least for NVPTX, it has been always profitable to extend to that next power of two to reduce the number of loads/stores, no matter how many elements get added.

I'm clarifying the comments around the extending, let me know if you think they are clearer.

https://github.com/llvm/llvm-project/pull/159388