[clang] [llvm] [AMDGPU] Emit a waitcnt instruction after each memory instruction (PR #79236)

Tue Feb 20 16:41:47 PST 2024

================
@@ -167,6 +167,10 @@ def FeatureCuMode : SubtargetFeature<"cumode",
   "Enable CU wavefront execution mode"
 >;
 
+def FeaturePreciseMemory
----------------
jwanggit86 wrote:

@Pierre-vh With the suggested change, the func `getAMDGPUTargetFeatures` looks something like the following:
```
void amdgpu::getAMDGPUTargetFeatures(...) {
...
  if (Args.hasFlag(options::OPT_mwavefrontsize64,
                   options::OPT_mno_wavefrontsize64, false))
    Features.push_back("+wavefrontsize64");

  if (Args.hasFlag(options::OPT_mamdgpu_precise_memory_op,
                   options::OPT_mno_amdgpu_precise_memory_op, false)) {
    Features.push_back("+precise-memory");
  }
  handleTargetFeaturesGroup(D, Triple, Args, Features,
                            options::OPT_m_amdgpu_Features_Group);
}

However, `handleTargetFeaturesGroup` does not seem to care whether an Arg is claimed or not. It will process every Arg, and we end up with the following:
`"-target-feature" "+precise-memory" "-target-feature" "+amdgpu-precise-memory-op"`

https://github.com/llvm/llvm-project/pull/79236