[PATCH] D124219: [AMDGPU] Fine tune LDS misaligned access speed

Tue May 17 15:07:23 PDT 2022

arsenm added inline comments.

================
Comment at: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:1598
         if (IsFast)
-          *IsFast= Alignment >= RequiredAlignment || Alignment < Align(4);
+          *IsFast = (Alignment >= RequiredAlignment) ? 128
+                    : (Alignment < Align(4))         ? 32
----------------
rampitec wrote:
> arsenm wrote:
> > What do the numbers mean?
> More or less 'it operates with a speed comparable to N-bit wide load'. With the full alignment ds128 is slower than ds96 for example. If underaligned it is comparable to a speed of a single dword access, which would then mean 32 < 128 and it is faster to issue a wide load regardless. 1 is simply 'slow, don't do it'. I.e. comparing an aligned load to a wider load which will not be aligned anymore the latter is slower.
> 
> But essentially it is just a rank, these are not additive.
This needs to be commented

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D124219/new/

https://reviews.llvm.org/D124219