[PATCH] D80364: [amdgpu] Teach load widening to handle non-DWORD aligned loads.
Matt Arsenault via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Thu May 21 07:31:20 PDT 2020
arsenm added inline comments.
================
Comment at: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:7469
+
+ // Certain preloaded registers are quaranteed to be DWORD aligned.
+ const ArgDescriptor *AD;
----------------
Typo quaranteed
================
Comment at: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:7472-7475
+ std::tie(AD, RC) =
+ MFI->getPreloadedValue(AMDGPUFunctionArgInfo::DISPATCH_PTR);
+ if (AD && AD->isRegister() && AD->getRegister() == Reg)
+ return true;
----------------
This will be true for all of the preloaded SGPRs. However I think looking for the argument copies here is the wrong approach. The original intrinsic calls should have been annotated with the align return value attribute. Currently this is a burden on the frontend/library call emitting the intrinsic. We could either annotate the intrinsic calls in one of the later passes (maybe AMDGPUCodeGenPrepare), or add a minimum alignment to the intrinsic definition which will always be applied, similar to how intrinsic declarations already get their other attributes
================
Comment at: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:7509-7550
+ if (Ld->getAlign() < 4) {
+ // Special handling on non-DWORD aligned loads.
+ // So far, only handle scalar loads only.
+ if (MemVT.isVector())
+ return SDValue();
+ // Skip non-naturally aligned loads.
+ if (Ld->getAlign() < MemVT.getStoreSize())
----------------
We already have essentially the same code in lowerKernargMemParameter (plus an IR version in AMDGPULowerKernelArguments). This just generalizes to any isBaseWithConstantOffset. Can you refactor these to avoid the duplication?
================
Comment at: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:7535
+ // TODO: Drop only low part of range.
+ NewLoad = DAG.getLoad(ISD::UNINDEXED, ISD::NON_EXTLOAD, MVT::i32, SL,
+ Ld->getChain(), NewPtr, Ld->getOffset(),
----------------
Can you use one of the simpler getLoad overloads? We don't really ever need to mention unindexed
================
Comment at: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:7537
+ Ld->getChain(), NewPtr, Ld->getOffset(),
+ Ld->getPointerInfo(), MVT::i32, Align(4),
+ Ld->getMemOperand()->getFlags(), Ld->getAAInfo(),
----------------
I think you could have a better alignment than 4 here
================
Comment at: llvm/test/CodeGen/AMDGPU/llvm.amdgcn.dispatch.ptr.ll:32
+}
+
declare noalias i8 addrspace(4)* @llvm.amdgcn.dispatch.ptr() #0
----------------
Should also have some tests with multiple loads. You should make sure that a use of the base address load, and a use of the offset address both CSE into a single load with 1 shift.
Also could test with an explicit align attribute on the dispatch ptr call
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D80364/new/
https://reviews.llvm.org/D80364
More information about the llvm-commits
mailing list