[PATCH] D81085: [AMDGPU/MemOpsCluster] Implement new heuristic for computing max mem ops cluster size

Jay Foad via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Fri Jun 5 01:36:29 PDT 2020


foad added inline comments.


================
Comment at: llvm/lib/Target/AMDGPU/SIInstrInfo.cpp:472-473
+         "Invalid NumLoads/NumBytes values");
+  unsigned AverageBytesPerLoad = NumBytes / NumLoads;
+  unsigned MaxNumLoads = AverageBytesPerLoad <= 4 ? 5 : 4;
+  return NumLoads <= MaxNumLoads;
----------------
I would suggest avoiding this division (because division is slow and introduces rounding). And the magic numbers need some kind of explanation. How did you come up with them?

So how about reformatting this as:
```
if (NumBytes <= 4 * NumLoads) {
  // Loads are dword or smaller (on average).
  MaxNumLoads = 5;
} else {
  // Loads are bigger than a dword (on average).
  MaxNumLoads = 4;
}
```
... plus some explanation of where the numbers 5 and 4 came from, and why they should be different cases.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D81085/new/

https://reviews.llvm.org/D81085





More information about the llvm-commits mailing list