[llvm] [LoadStoreVectorizer] Propagate alignment through contiguous chain (PR #145733)

Thu Jul 10 09:47:35 PDT 2025

================
@@ -1634,3 +1638,32 @@ std::optional<APInt> Vectorizer::getConstantOffset(Value *PtrA, Value *PtrB,
         .sextOrTrunc(OrigBitWidth);
   return std::nullopt;
 }
+
+void Vectorizer::propagateBestAlignmentsInChain(ArrayRef<ChainElem> C) const {
+  auto PropagateAlignments = [](auto ChainIt) {
+    ChainElem BestAlignedElem = *ChainIt.begin();
+    Align BestAlignSoFar = getLoadStoreAlignment(BestAlignedElem.Inst);
+
+    for (const ChainElem &E : ChainIt) {
+      Align OrigAlign = getLoadStoreAlignment(E.Inst);
+      if (OrigAlign > BestAlignSoFar) {
+        BestAlignedElem = E;
+        BestAlignSoFar = OrigAlign;
+        continue;
+      }
+
+      APInt DeltaFromBestAlignedElem =
+          APIntOps::abdu(E.OffsetFromLeader, BestAlignedElem.OffsetFromLeader);
+      // commonAlignment is equivalent to a greatest common power-of-two
+      // divisor; it returns the largest power of 2 that divides both A and B.
+      Align NewAlign = commonAlignment(
+          BestAlignSoFar, DeltaFromBestAlignedElem.getLimitedValue());
+      if (NewAlign > OrigAlign)
+        setLoadStoreAlignment(E.Inst, NewAlign);
+    }
+  };
+
+  // Propagate forwards and backwards.
+  PropagateAlignments(C);
+  PropagateAlignments(reverse(C));
----------------
dakersnar wrote:

> I'm not quite sure it it's literally a chain from the leaf to the root, or actually a graph with all known pointers derived from the same base.

So in this specific case, this chain would be split earlier in the algorithm in splitChainByContiguity, because there are memory gaps between the loads. But for the sake of your question let's ignore that.

The chain would be a list of elements and their offset from the base chain element. It's never a graph. Loads/stores that are not able to be boiled down to this format are tossed away. So your example would look like this:

{Inst = `load i32, base`, Offset = 0}
{Inst = `load i32, gep_16, align 16`, Offset = 16}
{Inst = `load i32, gep_32, align 4`, Offset = 32}

> What if the order of loads of gep_32 and gep_16 changes? We should end up with the same final alignment regardless of it. With the code iterating over the chain linearly, it may have trouble dealing with propagation across sibling branches.

The basic block order does not matter to the algorithm, at least not after it proves there are no data dependencies in splitChainByMayAliasInstrs. After that point, it sorts the chain by Offset order, so any permutation of the original IR would deterministically end up as the same chain.

> With the code iterating over the chain linearly, it may have trouble dealing with propagation across sibling branches.

This would be true if it was represented as a graph, but it is not. It is a list of tuples, each tuple containing an instruction and offset. Each instruction has been proven to have the same underlying object and all that matters for the algorithm is the offset from the "head" of the chain, which is often pointing directly to that underlying object.


https://github.com/llvm/llvm-project/pull/145733