[PATCH] D130313: [AMDGPU] Avoid flushing the vmcnt counter in loop preheaders if not necessary

Jay Foad via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Fri Jul 22 05:34:32 PDT 2022


foad added inline comments.


================
Comment at: llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp:1733
         // Vgpr use
         if (Op.isUse()) {
           for (int RegNo = Interval.first; RegNo < Interval.second; ++RegNo) {
----------------
Maybe use `.readsReg()` here? See comments about `undef` in the test case. (Note that readsReg is not mutually exclusive with isDef, since a subreg def operand both reads and writes the undeerlying super reg. So you might also need to remove the "else" on line 1752.)


================
Comment at: llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp:1742
             // value is likely loaded outside of the loop.
-            if (Brackets.getRegScore(RegNo, VM_CNT) > 0) {
+            unsigned Score = Brackets.getRegScore(RegNo, VM_CNT);
+            AMDGPU::Waitcnt Wait;
----------------
Maybe you just need: `if (Brackets.getRegScore(RegNo, VM_CNT) > Brackets.getScoreLB(VM_CNT))`?


================
Comment at: llvm/test/CodeGen/AMDGPU/waitcnt-vmcnt-loop.mir:570
+
+    renamable $vgpr5 = BUFFER_LOAD_DWORD_OFFEN undef renamable $vgpr0, renamable $sgpr4_sgpr5_sgpr6_sgpr7, 0, 0, 0, 0, 0, implicit $exec
+    S_CBRANCH_SCC1 %bb.1, implicit killed $scc
----------------
It is weird to have `undef` on all the use operands in this test cause that means the instruction doesn't care about the value it reads - so really no waitcnt is required before any of those reads.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D130313/new/

https://reviews.llvm.org/D130313



More information about the llvm-commits mailing list