[llvm] [clang] [AMDGPU] Emit a waitcnt instruction after each memory instruction (PR #79236)

Wed Jan 24 16:33:42 PST 2024

================
@@ -2561,6 +2567,70 @@ bool SIMemoryLegalizer::expandAtomicCmpxchgOrRmw(const SIMemOpInfo &MOI,
   return Changed;
 }
 
+bool SIMemoryLegalizer::GFX9InsertWaitcntForPreciseMem(MachineFunction &MF) {
+  const GCNSubtarget &ST = MF.getSubtarget<GCNSubtarget>();
+  const SIInstrInfo *TII = ST.getInstrInfo();
+  IsaVersion IV = getIsaVersion(ST.getCPU());
+
+  bool Changed = false;
+
+  for (auto &MBB : MF) {
+    for (auto MI = MBB.begin(); MI != MBB.end();) {
+      MachineInstr &Inst = *MI;
+      ++MI;
+      if (Inst.mayLoadOrStore() == false)
+        continue;
+
+      // Todo: if next insn is an s_waitcnt
+      AMDGPU::Waitcnt Wait;
+
+      if (!(Inst.getDesc().TSFlags & SIInstrFlags::maybeAtomic)) {
+        if (TII->isSMRD(Inst)) {          // scalar
----------------
jwanggit86 wrote:

That's a valid point. However, even though it does similar work, I'd say the similarity to SIInsertWaitcnt is only to a certain extent. The differences include: (1) This is for a different purpose, i.e. to support the so-called "precise memory mode", in particular precise memory exceptions, for certain GPUs. (2) This feature is optional while SIInsertWaintcnt is not. (3) The counter values in SIInsertWaitcnt are precise, while in this features the counters are simply set to 0.
If performance is a concern, pls note that this feature is controlled by a command-line option which by default is off. The user has to explicitly give the option for it to work. We assume the user knows there's extra work for the compiler when the option is turned on.

https://github.com/llvm/llvm-project/pull/79236