[PATCH] D149893: Rewrite LSV to handle longer chains.

Wed May 17 12:12:09 PDT 2023

jlebar added inline comments.

================
Comment at: llvm/lib/Transforms/Vectorize/LoadStoreVectorizer.cpp:207-208
+
+/// Reorders the instructions that I depends on (the instructions defining its
+/// operands), to ensure they dominate I.
+void reorder(Instruction *I) {
----------------
tra wrote:
> Naive question -- shouldn't this already be the case for the instructions within the same BB?
> 
Yes, but vectorization of a load might break this invariant.

================
Comment at: llvm/lib/Transforms/Vectorize/LoadStoreVectorizer.cpp:458
+    for (Instruction *I : ToErase) {
+      auto *GEP = dyn_cast<GetElementPtrInst>(getLoadStorePointerOperand(I));
+      if (I->use_empty())
----------------
tra wrote:
> Is there a particular reason GEPs are special cased here? Dont we want to erase other unused arguments of the instructions we're deleting? Are we guaranteed that GEPs are the only argument kinds we're dealing with here? E.g. we may conceivably have an `inttoptr`.
This is just matching the behavior of the original code.

I don't believe GEPs are special, but also I don't think we want to reeimplement a full DCE pass?

================
Comment at: llvm/lib/Transforms/Vectorize/LoadStoreVectorizer.cpp:689-690
+  //  - Starting from the longest prefix, try to create a vector of that length.
+  //  - If one of them works, great.  Recurse on any remaining elements in the
+  //    chain.
+  //  - If none of them work, discard the first element and recurse on a chain
----------------
tra wrote:
> Is the comment still valid? I'm failing to find where the recursion happens in the code.
Rewrote the comment to clarify that we don't recurse in the C++.

================
Comment at: llvm/lib/Transforms/Vectorize/LoadStoreVectorizer.cpp:285
+  /// Splits the chain into subchains of instructions which read/write a
+  /// contiguous block of memory.  Doesn't return length-1 chains.
+  std::vector<Chain> splitChainByContiguity(Chain &C);
----------------
tra wrote:
> This raises more questions:
> 
> - What *does* it return if it's given a 1-element chain?
> - Is it allowed not to split a chain (e.g if it's passed a 2-element chain)?
> - If it's given a 3-element chain, does it mean that it will return the input unchanged? What if we do want to split the chain. E.g. if we have 3 64-bit loads, for NVPTX it would be optimal to vectorize them into v2i64 and i64.
> 
> What *does* it return if it's given a 1-element chain?

Empty list.

> Is it allowed not to split a chain (e.g if it's passed a 2-element chain)?

A two-element chain might or might not be split up, depending on whether the instructions read/write a contiguous block of memory.  If it's split up, then we have two length-one chains.  We do not return length-one chains, so we return the empty list.

> If it's given a 3-element chain, does it mean that it will return the input unchanged?

If the three-element chain reads or writes contiguous memory, it will not be split up.  If it does not read/write contiguous memory, it will be split up.

> What if we do want to split the chain. E.g. if we have 3 64-bit loads, for NVPTX it would be optimal to vectorize them into v2i64 and i64.

This happens later, splitChainByAlignment.

================
Comment at: llvm/lib/Transforms/Vectorize/LoadStoreVectorizer.cpp:294
+  /// Splits the chain into subchains that make legal, aligned accesses.
+  /// Doesn't return length-1 chains.
+  std::vector<Chain> splitChainByAlignment(Chain &C);
----------------
tra wrote:
> "Doesn't return length-1 chains." seems to be a common pattern. It would help to elaborate on that in one place. E.g. something along the lines of "all chain split operations expect at least 2 elements long input chain and will never produce singleton chains as the output. In case input chain can not be split, the original chain is returned."  Note, that I didn't look at the implementation yet, so the details are probably wrong, edit as needed.
> all chain split operations expect at least 2 elements long input chain

They work fine with a one-element chain, it's just easier to remove them before sending them to the split function.

> In case input chain can not be split, the original chain is returned

This is not quite right.  Our goal is not to split up chains; the goal is to keep long chains and break them up only when necessary.

I tried to explain the overall algorithm at the top of the file.  Is there something that I can add at the top of the file that would make this more clear?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D149893/new/

https://reviews.llvm.org/D149893