[llvm] [clang] [AMDGPU] Emit a waitcnt instruction after each memory instruction (PR #68932)

Wed Nov 8 11:29:06 PST 2023

================
@@ -1809,6 +1816,23 @@ bool SIInsertWaitcnts::shouldFlushVmCnt(MachineLoop *ML,
   return HasVMemLoad && UsesVgprLoadedOutside;
 }
 
+bool SIInsertWaitcnts::insertWaitcntAfterMemOp(MachineFunction &MF) {
+  bool Modified = false;
+
+  for (auto &MBB : MF) {
----------------
jwanggit86 wrote:

Although they both insert s_waitcnt instructions, the new feature is quite different from the existing SIInsertWaitcnt pass. The new feature, controlled by a command-line option, inserts a "s_waitcnt 0" after each memory instruction. The logic therefore is very simple. The existing pass, however, has more complicated logic implemented with essentially a static analysis aided by its own data structures, which are not necessary for the new feature.

>From the performance point of view, it should be noted that by default this feature is not activated. Therefore, extra overhead should be minimized for the normal use-case scenario. A separate pass achieves this b/c there is only one extra IF for each compiled function. On the other hand, integrating with the existing pass would mean many more checks for the feature activation, which are waste in the normal case when the feature is not activated.

With the above 2 points, I think a separate pass is advantageous over an integrated pass. Pls let me know your thoughts.

https://github.com/llvm/llvm-project/pull/68932