[PATCH] D81085: [AMDGPU/MemOpsCluster] Implement new heuristic for computing max mem ops cluster size
Mahesha S via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Fri Jun 5 02:09:27 PDT 2020
hsmhsm marked an inline comment as done.
hsmhsm added inline comments.
================
Comment at: llvm/lib/Target/AMDGPU/SIInstrInfo.cpp:472-473
+ "Invalid NumLoads/NumBytes values");
+ unsigned AverageBytesPerLoad = NumBytes / NumLoads;
+ unsigned MaxNumLoads = AverageBytesPerLoad <= 4 ? 5 : 4;
+ return NumLoads <= MaxNumLoads;
----------------
foad wrote:
> I would suggest avoiding this division (because division is slow and introduces rounding). And the magic numbers need some kind of explanation. How did you come up with them?
>
> So how about reformatting this as:
> ```
> if (NumBytes <= 4 * NumLoads) {
> // Loads are dword or smaller (on average).
> MaxNumLoads = 5;
> } else {
> // Loads are bigger than a dword (on average).
> MaxNumLoads = 4;
> }
> ```
> ... plus some explanation of where the numbers 5 and 4 came from, and why they should be different cases.
Your suggestions make sense to me. And regarding those magic number 4 and 5, there is no analytical thinking here. It is purely based on experimentation to achieve below three goals.
(1) LLVM lit regressions should be as minimal as possible
(2) OpenCL shoc benchmark should not show any performance degradation (a kind of representative benchmark for `compute`)
(3) The performance issue in question (rocSPARSE benchmark) should show improvements.
After experimentation with different magic numbers, and different heuristics, this is what I could settle down with.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D81085/new/
https://reviews.llvm.org/D81085
More information about the llvm-commits
mailing list