[PATCH] D81085: [AMDGPU/MemOpsCluster] Implement new heuristic for computing max mem ops cluster size
Jay Foad via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Fri Jun 5 01:36:29 PDT 2020
foad added inline comments.
================
Comment at: llvm/lib/Target/AMDGPU/SIInstrInfo.cpp:472-473
+ "Invalid NumLoads/NumBytes values");
+ unsigned AverageBytesPerLoad = NumBytes / NumLoads;
+ unsigned MaxNumLoads = AverageBytesPerLoad <= 4 ? 5 : 4;
+ return NumLoads <= MaxNumLoads;
----------------
I would suggest avoiding this division (because division is slow and introduces rounding). And the magic numbers need some kind of explanation. How did you come up with them?
So how about reformatting this as:
```
if (NumBytes <= 4 * NumLoads) {
// Loads are dword or smaller (on average).
MaxNumLoads = 5;
} else {
// Loads are bigger than a dword (on average).
MaxNumLoads = 4;
}
```
... plus some explanation of where the numbers 5 and 4 came from, and why they should be different cases.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D81085/new/
https://reviews.llvm.org/D81085
More information about the llvm-commits
mailing list