[llvm] [AMDGPU] Lazily emit waitcnts on function entry (PR #73122)

Matt Arsenault via llvm-commits llvm-commits at lists.llvm.org
Wed Dec 6 07:23:26 PST 2023


================
@@ -292,6 +295,56 @@ class WaitcntBrackets {
     VgprVmemTypes[GprNo] = 0;
   }
 
+  void setNonEntryFunctionInitialState(const MachineFunction &MF) {
+    const MachineRegisterInfo &MRI = MF.getRegInfo();
+    const SIRegisterInfo &TRI = ST->getInstrInfo()->getRegisterInfo();
+
+    // All counters are in unknown states.
+    for (auto T : inst_counter_types())
+      setScoreUB(T, getWaitCountMax(T));
+
+    // There may be pending events of any type.
+    if (ST->hasVscnt()) {
+      setPendingEvent(VMEM_READ_ACCESS);
+      setPendingEvent(VMEM_WRITE_ACCESS);
+      setPendingEvent(SCRATCH_WRITE_ACCESS);
+    } else {
+      setPendingEvent(VMEM_ACCESS);
+    }
+    for (unsigned I = LDS_ACCESS; I < NUM_WAIT_EVENTS; ++I)
+      setPendingEvent(WaitEventType(I));
+
+    auto SetStateForPhysReg = [&](MCRegister Reg) {
+      RegInterval Interval = getRegInterval(Reg, &MRI, &TRI);
+      if (Interval.first < NUM_ALL_VGPRS) {
+        for (int RegNo = Interval.first; RegNo < Interval.second; ++RegNo) {
+          for (auto T : inst_counter_types())
+            setRegScore(RegNo, T, getWaitCountMax(T));
+          VgprVmemTypes[RegNo] = -1;
+        }
+      } else {
+        for (int RegNo = Interval.first; RegNo < Interval.second; ++RegNo)
+          setRegScore(RegNo, LGKM_CNT, getWaitCountMax(LGKM_CNT));
+      }
+    };
+
+    // Live-in registers may depend on any counter.
+    const MachineBasicBlock &EntryMBB = MF.front();
+    for (auto [Reg, Mask] : EntryMBB.liveins()) {
+      // TODO: Use Mask to narrow the interval?
+      SetStateForPhysReg(Reg);
+    }
+
+    // Reserved SGPRs (e.g. stack pointer or scratch descriptor) may depend on
+    // any counter.
+    // FIXME: Why are these not live-in to the function and/or the entry BB?
----------------
arsenm wrote:

SP is always manually managed without memory operations, so I don't think there's any useful way it can depend on a counter. If the caller didn't have FP/BP and was using it as a general SGPR, it's maybe possible they were used with a memory operation we need to complete before initializing them in the prolog 

https://github.com/llvm/llvm-project/pull/73122


More information about the llvm-commits mailing list