[llvm] [AMDGPU] Optimize LDS DMA soft waitcnt (PR #138802)
Jay Foad via llvm-commits
llvm-commits at lists.llvm.org
Thu Jun 19 01:57:47 PDT 2025
================
@@ -1278,6 +1278,23 @@ bool WaitcntGeneratorPreGFX12::applyPreexistingWaitcnt(
if (Opcode == AMDGPU::S_WAITCNT) {
unsigned IEnc = II.getOperand(0).getImm();
AMDGPU::Waitcnt OldWait = AMDGPU::decodeWaitcnt(IV, IEnc);
+
+ // These pseudo waitcnt instructions are only needed to synchronize DS
+ // operations with direct LDS loads that use vmcnt. We can safely relax
+ // them when no outstanding direct LDS loads exist, even if other vmcnt
+ // events are pending.
+ if (II.getOpcode() == AMDGPU::S_WAITCNT_DIRECT_LDS_LOAD_soft &&
----------------
jayfoad wrote:
> Yes, that's a problem with the current separation of concerns between the memory legalizer and the waitcnt inserter.
I don't understand why you call this a "problem". Maybe it doesn't work in exactly the way you would like, but I think it does work perfectly well.
The legalizer _is_ safe by default, which means it always inserts a zero count wherever a wait of any kind is required. The inserter _can_ safely relax the zero count based on its scores, but this is implemented without the inserter having to know anything about fences/invalidates/etc specifically; instead, everything the inserter needs to know is in the waitcnt instruction itself.
For most waitcnts, no extra information is needed; it's just a regular waitcnt. For the new case in this patch, the extra information is "this is a wait on vmcnt but I'm only interested in VMEM load-to-lds instructions".
https://github.com/llvm/llvm-project/pull/138802
More information about the llvm-commits
mailing list