[llvm] [LoadStoreVectorizer] Propagate alignment through contiguous chain (PR #145733)

Wed Jul 9 10:39:38 PDT 2025

================
@@ -0,0 +1,450 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 5
+; RUN: opt -passes=load-store-vectorizer -S < %s | FileCheck %s
+
+; The IR has the first float3 labeled with align 16, and that 16 should
+; be propagated such that the second set of 4 values
+; can also be vectorized together.
----------------
dakersnar wrote:

This instcombine case is the only one I currently know of those benefits from this specific optimization, but I would suspect there _could_  be a different way to arrive at a pattern like this.

But let's ignore those hypotheticals, because I understand the desire to stick within the realm of known use cases. The question is, are we comfortable changing the `unpackLoadToAggregate` algorithm in InstCombine to recurse through nested structs?

https://github.com/llvm/llvm-project/blob/8c32f9517a1207e899ae5276838eb1670e605cba/llvm/lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp#L788-L801

I feel like implementing that would be a bit of a mess, and leads to questions like "how many layers deep should it recurse"? Unpacking one layer at a time and then adding the nested struct elements to the IC worklist to be independently operated on in a later iteration is a much cleaner solution and feels more in line with the design philosophy of InstCombine. I could be wrong though, open to feedback or challenges to my assumptions.

And just to be clear, in case my previous explanation was confusing, the reason it would have to recurse through all layers at once is because we cannot store the knowledge that "the second element of `load struct.float3 align 4` is aligned to 16" on the instruction; there isn't a syntax available to express that.

https://github.com/llvm/llvm-project/pull/145733