[PATCH] D80364: [amdgpu] Teach load widening to handle non-DWORD aligned loads.

Thu May 21 07:31:20 PDT 2020

arsenm added inline comments.

================
Comment at: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:7469
+
+    // Certain preloaded registers are quaranteed to be DWORD aligned.
+    const ArgDescriptor *AD;
----------------
Typo quaranteed

================
Comment at: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:7472-7475
+    std::tie(AD, RC) =
+        MFI->getPreloadedValue(AMDGPUFunctionArgInfo::DISPATCH_PTR);
+    if (AD && AD->isRegister() && AD->getRegister() == Reg)
+      return true;
----------------
This will be true for all of the preloaded SGPRs. However I think looking for the argument copies here is the wrong approach. The original intrinsic calls should have been annotated with the align return value attribute. Currently this is a burden on the frontend/library call emitting the intrinsic. We could either annotate the intrinsic calls in one of the later passes (maybe AMDGPUCodeGenPrepare), or add a minimum alignment to the intrinsic definition which will always be applied, similar to how intrinsic declarations already get their other attributes

================
Comment at: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:7509-7550
+  if (Ld->getAlign() < 4) {
+    // Special handling on non-DWORD aligned loads.
+    // So far, only handle scalar loads only.
+    if (MemVT.isVector())
+      return SDValue();
+    // Skip non-naturally aligned loads.
+    if (Ld->getAlign() < MemVT.getStoreSize())
----------------
We already have essentially the same code in lowerKernargMemParameter (plus an IR version in AMDGPULowerKernelArguments). This just generalizes to any isBaseWithConstantOffset. Can you refactor these to avoid the duplication?

================
Comment at: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:7535
+    // TODO: Drop only low part of range.
+    NewLoad = DAG.getLoad(ISD::UNINDEXED, ISD::NON_EXTLOAD, MVT::i32, SL,
+                          Ld->getChain(), NewPtr, Ld->getOffset(),
----------------
Can you use one of the simpler getLoad overloads? We don't really ever need to mention unindexed 

================
Comment at: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:7537
+                          Ld->getChain(), NewPtr, Ld->getOffset(),
+                          Ld->getPointerInfo(), MVT::i32, Align(4),
+                          Ld->getMemOperand()->getFlags(), Ld->getAAInfo(),
----------------
I think you could have a better alignment than 4 here

================
Comment at: llvm/test/CodeGen/AMDGPU/llvm.amdgcn.dispatch.ptr.ll:32
+}
+
 declare noalias i8 addrspace(4)* @llvm.amdgcn.dispatch.ptr() #0
----------------
Should also have some tests with multiple loads. You should make sure that a use of the base address load, and a use of the offset address both CSE into a single load with 1 shift.

Also could test with an explicit align attribute on the dispatch ptr call

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D80364/new/

https://reviews.llvm.org/D80364