[PATCH] D115747: [AMDGPU] Flush the vmcnt counter in loop preheader when necessary

Baptiste Saleil via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Mon May 30 14:07:14 PDT 2022


bsaleil added inline comments.


================
Comment at: llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp:1629
 
+  if (Block.getFirstTerminator() == Block.end() &&
+      isPreheaderToFlush(Block, ScoreBrackets))
----------------
foad wrote:
> It is a shame that you have to implement this in two places, for blocks with and without terminators. I'm not sure if there is a better way. Maybe generateWaitcntInstBefore could be changed to take an iterator (which is allowed to be `end()`) instead of MI, so you would not need the new function generateWaitcntBlockEnd. But that would be quite invasive.
Yes, unfortunately I also think changing that would be too pervasive. generateWaitcntInstBefore relies a lot on the fact that MI is a valid instruction.


================
Comment at: llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp:1657-1658
+    return UpdateCache(false);
+  while (MachineLoop *Parent = Loop->getParentLoop())
+    Loop = Parent;
+
----------------
foad wrote:
> I don't quite understand the logic here. Why do you only check whether MBB is the preheader of the outermost loop? Could it not be the preheader of one of the inner loops?
I think it was code from the other implementation. I removed it, now inner loops are also optimized.


================
Comment at: llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp:1668
+// the given loop. We currently decide to flush in two situations:
+// 1. The loop contains vmem store(s), no vmem load and at least one use of a
+//    vgpr containing a value that is loaded outside of the loop. (Only on
----------------
foad wrote:
> From the description of cases (1) and (2), neither of them seems to include this case for GFX9:
> ```
>   v0 = vmem load ...
>   loop:
>     use of v0
>     vmem store ...
>     v1 = vmem load ...
>     no use of v1 in loop
> ```
I removed the restriction that for 2. on GFX9 we should not have any store. I'm not sure why I added that in the first place. So now this case is supported.


================
Comment at: llvm/test/CodeGen/AMDGPU/waitcnt-vmcnt-loop.mir:56
+# GFX9-LABEL: waitcnt_vm_loop
+# GFX9-LABEL: bb.0:
+# GFX9: S_WAITCNT 39
----------------
foad wrote:
> It's pretty unusual to use the -LABEL suffix for an actual label. Normally it is only used for the start of each function. Did you have a special reason for doing it this way?
This allows me to match specific s_waitcnt in preheaders or loop bodies, to avoid inadvertently matching the wrong waitcnt. E.g. matching a loop body waitcnt that was actually expected in the preheader.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D115747/new/

https://reviews.llvm.org/D115747



More information about the llvm-commits mailing list