[PATCH] D123524: [AMDGCN] Split unaligned 3 DWORD DS operations

Jay Foad via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Tue Apr 12 01:25:42 PDT 2022


foad accepted this revision.
foad added a comment.
This revision is now accepted and ready to land.

Looks OK to me. But there will always be benchmarks that go faster and slower with any change like this, because the compiler does not have perfect knowledge about the (mis)alignment of all data.



================
Comment at: llvm/lib/Target/AMDGPU/DSInstructions.td:880
 
-// FIXME: From performance point of view, is ds_read_b96/ds_write_b96 better choice
-// for unaligned accesses?
+// Selection will split most of the unaligned 3 dword acceses due to performace
+// reasons when beneficial. Keep these two patterns for the rest of the cases.
----------------
Typo "accesses", "performance"


================
Comment at: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:1564
+        if (IsFast)
+          *IsFast = Alignment >= RequiredAlignment || Alignment < Align(4);
+        return true;
----------------
Note that `Alignment < Align(4)` does not prove that the address is not dword aligned, just that the compiler does not know it's dword aligned. But I guess this is the best we can do for now.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D123524/new/

https://reviews.llvm.org/D123524



More information about the llvm-commits mailing list