[PATCH] D124246: [AMDGPU] Adjust wave priority based on VMEM instructions to avoid duty-cycling.

Tue Apr 26 04:23:47 PDT 2022

kosarev added inline comments.

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUSetWavePriority.cpp:95
+    return false;
+
+  const GCNSubtarget &ST = MF.getSubtarget<GCNSubtarget>();
----------------
foad wrote:
> I think it only makes sense to run this pass on kernels (i.e. the entry point of a wave), not on subfunctions, so you should bail out if `!AMDGPU::isKernel(MF.getFunction().getCallingConv())` or similar.
Would `AMDGPU::isEntryFunctionCC()` be the right function for that?

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUSetWavePriority.cpp:131
+    if (MBBInfos[&MBB].MayReachVMEMLoad) {
+      if (MBB.isReturnBlock())
+        PriorityLoweringBlocks.insert(&MBB);
----------------
foad wrote:
> I could be wrong but I don't think kernels have a normal "return" at the end. Maybe change this to `MBB.succ_empty()` to make it a bit more generic?
Changed. For `amdgpu_ps` shaders in the test I do see return MIs, though they seem to be emitted just as `; return to shader part epilog` comments.

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUSetWavePriority.cpp:136
+
+    if (CanLowerPriorityDirectlyInPredecessors(MBB, MBBInfos)) {
+      for (MachineBasicBlock *Pred : MBB.predecessors()) {
----------------
foad wrote:
> A very minor point: you only lower priority in any predecessor if you can do it in all predecessors. Would it make sense to lower priority in some predecessors even if you can't do it in all of them?
Yeah, good question. My analysis here is that while we know that would cost us extra time executing the `s_setprio 0` in `MBB` when control comes from one of such predecessors, I presently do not have any evidence at hand that that would help cases considered critical or at least important. The thinking therefore was that strategically we might prefer let the problem, if there is any, reveal itself and via that get a concrete reproducer, rather than to speculate early and never know for sure if it's actually necessary.

I probably should clarify that I see this patch as an attempt to introduce a reliable ground for developing our understanding of the nature of the original issue (which I guess may take some time testing and collecting relevant use cases), and not an implementation that tries to do the best thing for all theoretically possible cases right away.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D124246/new/

https://reviews.llvm.org/D124246