[PATCH] D80364: [amdgpu] Teach load widening to handle non-DWORD aligned loads.

Fri May 22 18:14:31 PDT 2020

hliao marked an inline comment as done.
hliao added inline comments.

================
Comment at: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:7472-7475
+    std::tie(AD, RC) =
+        MFI->getPreloadedValue(AMDGPUFunctionArgInfo::DISPATCH_PTR);
+    if (AD && AD->isRegister() && AD->getRegister() == Reg)
+      return true;
----------------
arsenm wrote:
> hliao wrote:
> > arsenm wrote:
> > > hliao wrote:
> > > > hliao wrote:
> > > > > hliao wrote:
> > > > > > arsenm wrote:
> > > > > > > This will be true for all of the preloaded SGPRs. However I think looking for the argument copies here is the wrong approach. The original intrinsic calls should have been annotated with the align return value attribute. Currently this is a burden on the frontend/library call emitting the intrinsic. We could either annotate the intrinsic calls in one of the later passes (maybe AMDGPUCodeGenPrepare), or add a minimum alignment to the intrinsic definition which will always be applied, similar to how intrinsic declarations already get their other attributes
> > > > > > do we have that attribute available? could you elaborate in detail?
> > > > > OK, I found that.
> > > > In another review [[ https://reviews.llvm.org/D80422 | D80422 ]], relevant intrinsics are being annotated with assumed alignment on the return pointer. Unfortunately, in SelectionDAG, we won't have the facility to keep tracking of that hint. I will enhance `AMDGPUCodeGenPrepare` pass to do that similar thing.
> > > We don't need to explicitly know the pointer alignment. If the intrinsic was properly annotated from the beginning (as would be the case after D80422), the downstream load users would have the correct alignment assigned. The optimizer propagates alignment information already
> > That load won't be marked as DWORD aligned even after those intrinsics are annotated correctly as they are just not DWORD aligned. For example, the newly-added test has an i16 load from pointer (dippatch.ptr + 6). It's not DOWRD aligned. The transformation performed here is to widen such load if that pointer is calculated in the form a DWORD aligned pointer (`base`) with a constant offset (`off`). With that,
> > 
> > ```
> > (i16 load (add base, off))
> > ```
> > 
> > is translated into
> > 
> > ```
> > (trunc (lshr (i32 load (add base, off - 2)))
> > ```
> > We need to know that base is DWORD aligned. However, in SDNode, we don't have any facility to retain the original alignment from LLVM IR.
> is computeKnownBitsForTargetNode called for CopyFromReg? If so, we could move the argument check in there
unfortunately NO

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D80364/new/

https://reviews.llvm.org/D80364