[PATCH] D120544: [AMDGPU] Omit unnecessary waitcnt before barriers

Fri Feb 25 12:13:50 PST 2022

kerbowa added inline comments.

================
Comment at: llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp:1143
   if (MI.getOpcode() == AMDGPU::S_BARRIER &&
-      !ST->hasAutoWaitcntBeforeBarrier()) {
+      !ST->hasAutoWaitcntBeforeBarrier() && !ST->supportsBackOffBarrier()) {
     Wait = Wait.combined(AMDGPU::Waitcnt::allZero(ST->hasVscnt()));
----------------
rampitec wrote:
> kerbowa wrote:
> > arsenm wrote:
> > > Is this really a distinct feature if it's the same check as auto waitcnt?
> > It's not the same as auto waitcnt. There are distinct subtargets that support each feature. This check is just saying we don't need an explicit waitcnt before barriers under any circumstances if there is an implicit wait by HW.
> > 
> > We don't actually use the auto waitcnt subtarget feature since it can be configured dynamically by HW. I was debating removing it entirely.
> But then waitcount is still needed after the barrier? I.e. it does not mean that barrier wait for all outstanding memory operations, right?
If auto waitcnt is enabled, I believe it does mean wait for all outstanding memory operations. However, the pass will still add redundant waitcnt after the barrier because the auto waitcnt feature is not optimized. But the feature is not used right now AFAIK.

If the subtarget has the "BackOffBarrier" feature, then waitcount are still needed after the barrier.

Normally barriers are paired with fences which may also enforce having waitcnt before the barrier. This change allows more flexibility and optimization in the cases where HW does not require waiting before barriers.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D120544/new/

https://reviews.llvm.org/D120544