[clang] [llvm] [AMDGPU] Introduce asyncmark/wait intrinsics (PR #173259)

Mon Dec 22 20:10:26 PST 2025

================
@@ -1107,13 +1153,31 @@ void WaitcntBrackets::updateByEvent(WaitEventType E, MachineInstr &Inst) {
         setVMemScore(LDSDMA_BEGIN + Slot, T, CurrScore);
     }
 
+    if (Context->isAsyncLdsDmaWrite(Inst) && T == LOAD_CNT) {
+      // FIXME: Not supported on GFX12 yet. Will need a new feature when we do.
+      assert(!SIInstrInfo::usesASYNC_CNT(Inst));
+      AsyncScore[T] = CurrScore;
+    }
+
     if (SIInstrInfo::isSBarrierSCCWrite(Inst.getOpcode())) {
       setRegScore(AMDGPU::SCC, T, CurrScore);
       PendingSCCWrite = &Inst;
     }
   }
 }
 
+void WaitcntBrackets::recordAsyncMark(MachineInstr &Inst) {
+  AsyncMarkers.emplace_back(AsyncScore);
----------------
ssahasra wrote:

Wow, that sent me down a minor rabbit-hole:
https://stackoverflow.com/questions/10890653/why-would-i-ever-use-push-back-instead-of-emplace-back
https://abseil.io/tips/112

So the conclusion seems to be that always prefer `push_back()`, unless you are absolutely sure that the performance gain from `emplace_back()` is noticeable and you know what you are doing?

https://github.com/llvm/llvm-project/pull/173259