[PATCH] D124671: [AMDGPU] Do not raise wave priority beyond a specific number of VALU instructions.
Ivan Kosarev via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Fri Aug 12 04:15:06 PDT 2022
kosarev marked 2 inline comments as done.
kosarev added a comment.
In D124671#3698865 <https://reviews.llvm.org/D124671#3698865>, @nhaehnle wrote:
> I don't quite follow the cross-basic-block logic. Given what I understand of the goal of the heuristic, I expect it to be: "insert a s_setprio 0 before the first long section of dense VALU that can happen after a VMEM load (if any)".
>
> Instead of `MBBInfo::LastVMEMLoad` I can see a `MBBInfo::LastVALUSequenceBreak` and an `MBBInfo::PastVMEMLoad` boolean to indicate whether a VMEM load can have been issued at the end of each basic block.
I think knowing if we are at a point past a VMEM load wouldn't help as there may be another VMEM load down the control flow. And a similar problem with the last sequence break as at where the sequence finally becomes 'long enough', we can have several points at which it begins.
If I'm not wrong that there is no harm in lowering the priority immediately after VMEM loads, it seems it's still easier implementation-wise to track them and not other things. The updated version just makes sure we only consider VMEM loads that are followed by VALU sequences of the required length. Then the rest of the logic remains the same.
================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUSetWavePriority.cpp:127-128
+ break;
+ if (SIInstrInfo::isVMEM(MI) && MI.mayLoad())
+ LastVMEMLoad = &MI;
+ if (SIInstrInfo::isVALU(MI))
----------------
kosarev wrote:
> nhaehnle wrote:
> > I believe NumVALUInsts should be reset to 0 here. It should probably also be reset at a number of other events, in particular DS instructions.
> >
> > The reasoning is that we want to lower priority just before running a long dense block of VALU, so that other waves have a better chance of running address calculation VALU.
> So would something like this work for the purpose?
>
> ```
> if (SIInstrInfo::isVALU(MI))
> ++NumVALUInsts;
> else
> NumVALUInsts = 0;
> ```
Updated to break the sequence on DS instructions.
================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUSetWavePriority.cpp:30-31
+static cl::opt<unsigned> VALUInstsThreshold(
+ "amdgpu-set-wave-priority-valu-insts-threshold",
+ cl::desc("VALU instruction count threshold for adjusting wave priority"),
+ cl::init(100), cl::Hidden);
----------------
nhaehnle wrote:
> kosarev wrote:
> > arsenm wrote:
> > > Should we be counting cycles instead of instructions?
> > We don't know it yet, I'm afraid. For the couple real use cases that we have for the issue that this pass tries to address counting instructions looks sufficient. This being coupled with that proper cycle counting might be not very trivial, we may be at the risk of over-engineering here.
> Could this also be configured via a function attribute? Those are less problematic than command-line options in a driver context.
Done.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D124671/new/
https://reviews.llvm.org/D124671
More information about the llvm-commits
mailing list