[PATCH] D120544: [AMDGPU] Omit unnecessary waitcnt before barriers
Austin Kerbow via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Fri Feb 25 12:13:50 PST 2022
kerbowa added inline comments.
================
Comment at: llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp:1143
if (MI.getOpcode() == AMDGPU::S_BARRIER &&
- !ST->hasAutoWaitcntBeforeBarrier()) {
+ !ST->hasAutoWaitcntBeforeBarrier() && !ST->supportsBackOffBarrier()) {
Wait = Wait.combined(AMDGPU::Waitcnt::allZero(ST->hasVscnt()));
----------------
rampitec wrote:
> kerbowa wrote:
> > arsenm wrote:
> > > Is this really a distinct feature if it's the same check as auto waitcnt?
> > It's not the same as auto waitcnt. There are distinct subtargets that support each feature. This check is just saying we don't need an explicit waitcnt before barriers under any circumstances if there is an implicit wait by HW.
> >
> > We don't actually use the auto waitcnt subtarget feature since it can be configured dynamically by HW. I was debating removing it entirely.
> But then waitcount is still needed after the barrier? I.e. it does not mean that barrier wait for all outstanding memory operations, right?
If auto waitcnt is enabled, I believe it does mean wait for all outstanding memory operations. However, the pass will still add redundant waitcnt after the barrier because the auto waitcnt feature is not optimized. But the feature is not used right now AFAIK.
If the subtarget has the "BackOffBarrier" feature, then waitcount are still needed after the barrier.
Normally barriers are paired with fences which may also enforce having waitcnt before the barrier. This change allows more flexibility and optimization in the cases where HW does not require waiting before barriers.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D120544/new/
https://reviews.llvm.org/D120544
More information about the llvm-commits
mailing list