[PATCH] D123524: [AMDGCN] Split unaligned 3 DWORD DS operations

Tue Apr 12 01:25:42 PDT 2022

foad accepted this revision.
foad added a comment.
This revision is now accepted and ready to land.

Looks OK to me. But there will always be benchmarks that go faster and slower with any change like this, because the compiler does not have perfect knowledge about the (mis)alignment of all data.

================
Comment at: llvm/lib/Target/AMDGPU/DSInstructions.td:880

-// FIXME: From performance point of view, is ds_read_b96/ds_write_b96 better choice
-// for unaligned accesses?
+// Selection will split most of the unaligned 3 dword acceses due to performace
+// reasons when beneficial. Keep these two patterns for the rest of the cases.
----------------
Typo "accesses", "performance"

================
Comment at: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:1564
+        if (IsFast)
+          *IsFast = Alignment >= RequiredAlignment || Alignment < Align(4);
+        return true;
----------------
Note that `Alignment < Align(4)` does not prove that the address is not dword aligned, just that the compiler does not know it's dword aligned. But I guess this is the best we can do for now.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D123524/new/

https://reviews.llvm.org/D123524