[PATCH] D81085: [AMDGPU/MemOpsCluster] Implement new heuristic for computing max mem ops cluster size

Fri Jun 5 01:36:29 PDT 2020

foad added inline comments.

================
Comment at: llvm/lib/Target/AMDGPU/SIInstrInfo.cpp:472-473
+         "Invalid NumLoads/NumBytes values");
+  unsigned AverageBytesPerLoad = NumBytes / NumLoads;
+  unsigned MaxNumLoads = AverageBytesPerLoad <= 4 ? 5 : 4;
+  return NumLoads <= MaxNumLoads;
----------------
I would suggest avoiding this division (because division is slow and introduces rounding). And the magic numbers need some kind of explanation. How did you come up with them?

So how about reformatting this as:
```
if (NumBytes <= 4 * NumLoads) {
  // Loads are dword or smaller (on average).
  MaxNumLoads = 5;
} else {
  // Loads are bigger than a dword (on average).
  MaxNumLoads = 4;
}
```
... plus some explanation of where the numbers 5 and 4 came from, and why they should be different cases.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D81085/new/

https://reviews.llvm.org/D81085