[PATCH] D124219: [AMDGPU] Fine tune LDS misaligned access speed
Stanislav Mekhanoshin via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Tue May 17 14:45:44 PDT 2022
rampitec added inline comments.
================
Comment at: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:1598
if (IsFast)
- *IsFast= Alignment >= RequiredAlignment || Alignment < Align(4);
+ *IsFast = (Alignment >= RequiredAlignment) ? 128
+ : (Alignment < Align(4)) ? 32
----------------
arsenm wrote:
> What do the numbers mean?
More or less 'it operates with a speed comparable to N-bit wide load'. With the full alignment ds128 is slower than ds96 for example. If underaligned it is comparable to a speed of a single dword access, which would then mean 32 < 128 and it is faster to issue a wide load regardless. 1 is simply 'slow, don't do it'. I.e. comparing an aligned load to a wider load which will not be aligned anymore the latter is slower.
But essentially it is just a rank, these are not additive.
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D124219/new/
https://reviews.llvm.org/D124219
More information about the llvm-commits
mailing list