[PATCH] D115747: [AMDGPU] Flush the vmcnt counter in loop preheader when necessary

Jay Foad via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Fri Apr 1 05:25:06 PDT 2022


foad added a comment.

Overall I think this looks really good. None of my inline comments are blockers.

One general observation: the waitcnt that you insert in a preheader is always vmcnt(0), but there are cases where vmcnt(1) or higher could be useful, e.g.:

  v0 = vmem load ...
  v1 = vmem load ...
  loop:
    use of v0
    v2 = vmem load ...
    (no use of v1 or v2 inside loop)

Could you at least add a test case like that, even if you don't want to implement the vmcnt(1) in the preheader yet?



================
Comment at: llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp:1668
+// the given loop. We currently decide to flush in two situations:
+// 1. The loop contains vmem store(s), no vmem load and at least one use of a
+//    vgpr containing a value that is loaded outside of the loop. (Only on
----------------
>From the description of cases (1) and (2), neither of them seems to include this case for GFX9:
```
  v0 = vmem load ...
  loop:
    use of v0
    vmem store ...
    v1 = vmem load ...
    no use of v1 in loop
```


================
Comment at: llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp:1671
+//    targets with no vscnt counter).
+// 2. The loop contains vmem load(s), but the loaded values are not used in the
+//    loop, and at least one use of a vgpr containing a value that is loaded
----------------
Does this case (2) also apply to other kinds of loads, e.g. for SMEM loads would it be better to pull the lgkmcnt(0) out of the loop?


================
Comment at: llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp:1677
+                                        WaitcntBrackets &Brackets) {
+  if (!ML->getLoopPreheader())
+    return false;
----------------
Could just assert that there is a preheader? (But do you need to?)


================
Comment at: llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp:1686
+
+  for (MachineBasicBlock *const MBB : ML->blocks()) {
+    for (MachineInstr &MI : *MBB) {
----------------
We don't usually mark loop variables as `const` - just remove this?


================
Comment at: llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp:1729
+    return true;
+  bool flush = HasVMemLoad && UsesVgprLoadedOutside;
+  if (!ST->hasVscnt())
----------------
Upper case F in Flush.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D115747/new/

https://reviews.llvm.org/D115747



More information about the llvm-commits mailing list