[llvm] [AMDGPU] Optimize LDS DMA soft waitcnt (PR #138802)
Sameer Sahasrabuddhe via llvm-commits
llvm-commits at lists.llvm.org
Thu Jun 19 08:42:09 PDT 2025
================
@@ -1278,6 +1278,23 @@ bool WaitcntGeneratorPreGFX12::applyPreexistingWaitcnt(
if (Opcode == AMDGPU::S_WAITCNT) {
unsigned IEnc = II.getOperand(0).getImm();
AMDGPU::Waitcnt OldWait = AMDGPU::decodeWaitcnt(IV, IEnc);
+
+ // These pseudo waitcnt instructions are only needed to synchronize DS
+ // operations with direct LDS loads that use vmcnt. We can safely relax
+ // them when no outstanding direct LDS loads exist, even if other vmcnt
+ // events are pending.
+ if (II.getOpcode() == AMDGPU::S_WAITCNT_DIRECT_LDS_LOAD_soft &&
----------------
ssahasra wrote:
Right. It's good to know that the legalizer is meant to be safe by default. The problem is that the waitcnt inserter does not have complete knowledge when it relaxes the waitcnt. (This is besides the fact that the inserter is also supposed to restore legality between a load and its use, or a store and its operands, etc). In particular, the legalizer will insert vmcnt(0) after an invalidate, but there is an explicit comment in the inserter which says that it should not be aware of invalidates. That's what makes things difficult. The inserter can safely relax a vmcnt(0) if it also correctly tracks the invalidates. That's the part that is a problem right now.
https://github.com/llvm/llvm-project/pull/138802
More information about the llvm-commits
mailing list