[PATCH] D96421: [AMDGPU] Better selection of base offset when merging DS reads/writes

Wed Feb 10 08:12:40 PST 2021

arsenm added inline comments.

================
Comment at: llvm/lib/Target/AMDGPU/SILoadStoreOptimizer.cpp:802
   // Try to shift base address to decrease offsets.
-  unsigned OffsetDiff = std::abs((int)EltOffset1 - (int)EltOffset0);
-  CI.BaseOff = std::min(CI.Offset, Paired.Offset);
+  unsigned Min = std::min(EltOffset0, EltOffset1);
+  unsigned Max = std::max(EltOffset0, EltOffset1);
----------------
Should probably use uint32_t throughout here

================
Comment at: llvm/lib/Target/AMDGPU/SILoadStoreOptimizer.cpp:805

-  if ((OffsetDiff % 64 == 0) && isUInt<8>(OffsetDiff / 64)) {
+  unsigned Mask = maskTrailingOnes<unsigned>(8) * 64;
+  if (((Max - Min) & ~Mask) == 0) {
----------------
const 

================
Comment at: llvm/lib/Target/AMDGPU/SILoadStoreOptimizer.cpp:814-816
+      unsigned BaseOff =
+          Hi & (maskLeadingOnes<unsigned>(countLeadingZeros(Lo ^ Hi) + 1) |
+                maskTrailingOnes<unsigned>(6));
----------------
This bit magic is a bit fancy. Can you extract it to a separate function

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D96421/new/

https://reviews.llvm.org/D96421