[PATCH] D133584: [DAGCombiner] [AMDGPU] Allow vector loads in MatchLoadCombine

Tue Sep 27 10:03:35 PDT 2022

jrbyrnes added a comment.

In D133584#3818235 <https://reviews.llvm.org/D133584#3818235>, @spatel wrote:

> I'm still not confident in my understanding of the various index values even with the added code comments. 
> I'll try to step through some of these tests in the debugger to get a better idea, but it would be good if another reviewer can have a look too for a second opinion.

Hi -- thanks for your comments and reviews. I realize I never submitted my last round of comments, so I've done so here. Apologies for the delay -- this has probably added to the incomplete understanding.

I have included a description of the various index parameters via example in an inline comment.

================
Comment at: llvm/include/llvm/CodeGen/SelectionDAGAddressAnalysis.h:52
   SDValue getIndex() const { return Index; }
+  void setOffset(int64_t NewOff) { *Offset = NewOff; }
   bool hasValidOffset() const { return Offset.has_value(); }
----------------
RKSimon wrote:
> Would it be better to replace this with addToOffset() that adds to the existing offset (or set it if == None)? 
That does make more sense, thanks!

================
Comment at: llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp:7894
+  // Multiple uses of vector loads are expected
+  if (!Root && !Op.hasOneUse() &&
+      !(Op.getOpcode() == ISD::LOAD && Op.getValueType().isVector()))
----------------
spatel wrote:
> Check that `Depth != 0` instead of adding the Root parameter?
Thanks for pointing this out!

In fact, this patch did not add the Root parameter, but it does look unnecessary, so I think it makes sense to remove it along with the other changes in this patch.

================
Comment at: llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp:8349-8350
+
+    // Add the VectorOffset (if any) to the offset of Ptr
+    Ptr.setOffset(Ptr.getOffset() + P->VectorOffset);
+
----------------
spatel wrote:
> spatel wrote:
> > Do we have a test where VectorOffset is non-zero? If not, please add one (add baseline tests as needed, no pre-commit review is necessary).
> This is marked as done, but I can't tell from just looking at the tests which one exercises that path. Add an example/comment here or next to a test, so it's clearer what this looks like?
The non-zero VectorOffset path is covered by any test that uses a ExtractElement with a non-zero index. As an example, `extractelement <4 x i8> %ld, i32 1` will have a VectorOffset of 1. 

CalculateByteProvider will only allow such a VectorOffset if we are trying to Provide for the 1th byte. We enforce this by making sure the StartingIndex == VectorOffset. The idea is to match {0, 1, 2, 3} bytes with VectorLoad -> ExtractElement {0, 1, 2, 3}. If we find an ExtractElement index that does not match the VectorOffset, we conservatively assume that we are shuffling the elements in the vector and can not combine into a load.

So this covers the `VectorIndex` (e.g. VectorOffset) and `StartingIndex` parameters in CalculateByteProvider. The remaining parameter is the original `Index` parameter. This parameter ensures the shift and loadwidth we find are able to provide for the relevant byte. For example, if we are trying to provide for the 2th byte, then we must find either a Load 8+bit -> SHL 16, or Load 16+bit -> SHL 8, or Load 24+bit. Basically, the combination of shift and byte width must cover the byte we are trying to provide for. If so, then the check `(Index >= NarrowByteWidth)` will be false, and we will return the ByteProvider.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D133584/new/

https://reviews.llvm.org/D133584