[PATCH] D120544: [AMDGPU] Omit unnecessary waitcnt before barriers
Stanislav Mekhanoshin via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Fri Feb 25 11:58:51 PST 2022
rampitec added inline comments.
================
Comment at: llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp:1143
if (MI.getOpcode() == AMDGPU::S_BARRIER &&
- !ST->hasAutoWaitcntBeforeBarrier()) {
+ !ST->hasAutoWaitcntBeforeBarrier() && !ST->supportsBackOffBarrier()) {
Wait = Wait.combined(AMDGPU::Waitcnt::allZero(ST->hasVscnt()));
----------------
kerbowa wrote:
> arsenm wrote:
> > Is this really a distinct feature if it's the same check as auto waitcnt?
> It's not the same as auto waitcnt. There are distinct subtargets that support each feature. This check is just saying we don't need an explicit waitcnt before barriers under any circumstances if there is an implicit wait by HW.
>
> We don't actually use the auto waitcnt subtarget feature since it can be configured dynamically by HW. I was debating removing it entirely.
But then waitcount is still needed after the barrier? I.e. it does not mean that barrier wait for all outstanding memory operations, right?
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D120544/new/
https://reviews.llvm.org/D120544
More information about the llvm-commits
mailing list