[PATCH] D124219: [AMDGPU] Fine tune LDS misaligned access speed
Matt Arsenault via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Tue May 17 15:07:23 PDT 2022
arsenm added inline comments.
================
Comment at: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:1598
if (IsFast)
- *IsFast= Alignment >= RequiredAlignment || Alignment < Align(4);
+ *IsFast = (Alignment >= RequiredAlignment) ? 128
+ : (Alignment < Align(4)) ? 32
----------------
rampitec wrote:
> arsenm wrote:
> > What do the numbers mean?
> More or less 'it operates with a speed comparable to N-bit wide load'. With the full alignment ds128 is slower than ds96 for example. If underaligned it is comparable to a speed of a single dword access, which would then mean 32 < 128 and it is faster to issue a wide load regardless. 1 is simply 'slow, don't do it'. I.e. comparing an aligned load to a wider load which will not be aligned anymore the latter is slower.
>
> But essentially it is just a rank, these are not additive.
This needs to be commented
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D124219/new/
https://reviews.llvm.org/D124219
More information about the llvm-commits
mailing list