[PATCH] D124671: [AMDGPU] Only raise wave priority if there is a long enough sequence of VALU instructions.

Thu Sep 1 04:59:04 PDT 2022

nhaehnle added inline comments.

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUSetWavePriority.cpp:159-163
+    unsigned MaxNumVALUInsts =
+        std::max(MaxNumVALUInstsInMiddle, NumVALUInstsAtEnd);
+    Info.MayReachVMEMLoad =
+        SuccsMayReachVMEMLoad ||
+        (Info.LastVMEMLoad && MaxNumVALUInsts >= VALUInstsThreshold);
----------------
kosarev wrote:
> nhaehnle wrote:
> > If the winner is MaxNumVALUInstsInMiddle, I believe the relevant VMEMLoad should be the VMEMLoad that appeared just before the corresponding sequence of VALU instead of the last one.
> > 
> > I think that instead of Info.LastVMEMLoad, we'd want to have Info.LastVMEMLoadBeforeLongVALU (feel free to think of a better name), which is only set here at the end if the threshold is exceeded. The loop above would keep track of the most recently seen VMEMLoad as well as the one corresponding to the longest VALU sequence in the middle so far.
> The ticket says the priority is supposed to be lowered after the last VMEM load, and this updated implementation has been prepared in assumption that counting VALUs doesn't affect that principle. So if this point stands, then I understand we should not be interested in any VALU sequences preceding VMEM loads, even if long enough.
> 
> In terms of the code, as we reset `MaxNumVALUInstsInMiddle` every time we ran into a VMEM load, I'm not sure I see how an in-the-middle sequence followed by a VMEM load can possibly be the winner.
> 
> `Info.LastVMEMLoad` storing the actual last VMEM load in the block should not be a problem because it is the `Info.MayReachVMEMLoad` flag that the following analysis takes into account. And speaking of naming, I'm not perfectly happy with the the name of that flag as it actually means 'may reach any of the last VMEM loads that precede a long-enough sequence of VALU instructions'. Would appreciate if anyone can suggest a better alternative of a practical length.
I did think the policy was supposed to be "lower priority before the longest sequence of VALU instructions, if the sequence length crosses a certain threshold". But we can start with "lower after last VMEM load, if subsequent VALU sequence is long enough". So then, you're right and the code is okay as-is.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D124671/new/

https://reviews.llvm.org/D124671