[PATCH] D123524: [AMDGPU] Split unaligned 3 DWORD DS operations

Tue Apr 12 08:02:59 PDT 2022

rampitec marked an inline comment as done.
rampitec added inline comments.

================
Comment at: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:1564
+        if (IsFast)
+          *IsFast = Alignment >= RequiredAlignment || Alignment < Align(4);
+        return true;
----------------
foad wrote:
> Note that `Alignment < Align(4)` does not prove that the address is not dword aligned, just that the compiler does not know it's dword aligned. But I guess this is the best we can do for now.
Right, then in this case if it is misaligned by 1 or 2 it is faster with a single instruction. If it is misaligned by 4 or 8 it would be slightly faster to split into 32 bit instructions, but this is what we do not know. If it is really aligned this is really the fastest. But without Align(4) check we would have to split it to b8 instructions and that will be really slow in any scenario.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D123524/new/

https://reviews.llvm.org/D123524