[llvm] [AMDGPU] always emit a soft wait even if it is trivially ~0 (PR #147257)
Sameer Sahasrabuddhe via llvm-commits
llvm-commits at lists.llvm.org
Thu Jul 10 21:02:22 PDT 2025
ssahasra wrote:
> Currently S_WAITCNT_soft has exactly the same execution semantics as S_WAITCNT. (The only difference is whether we want SIInsertWaitcnts to optimize it away if it can prove that it is a no-op.) What I don't like about your patches is that you're assigning some extra meaning to S_WAITCNT_soft vmcnt(63), or to the mere existence of an S_WAITCNT_soft even if the corresponding S_WAITCNT would be a no-op. I think it would be much cleaner to use a different pseudo for that.
> I would prefer that we keep it functionally equivalent to `S_WAITCNT`. I'd like to think that another pass could insert, say, `S_WAITCNT_soft expcnt(0)` and have `SIInsertWaitcnts` treat it as a wait for expcnt that can be relaxed or removed, _without_ getting extra unwanted behaviour to do with waiting for loads-to-LDS.
Aha! Now this kinda makes sense to me. And it brings out something that has been nagging me at the back of my mind. Note how there is a FIXME in the next patch which says that we need to wait for direct loads to LDS only at a release fence, but currently there is no way to detect that situation. What I would really like to have is an S_WAIT_FENCE_soft with extra parameters like address space and ordering. There would be no wait count parameters, because it is entirely up to SIInsertWaitcnts to compute what is needed. Again, this is not specifically about direct loads to LDS.
> Maybe part of the disagreement is about what S_WAITCNT_soft should be. Since it is currently only used in more-or-less one place (*CacheControl::insertWait in SIMemoryLegalizer) I guess you could argue that it should be defined to do whatever that user needs it to do.
I would say the "current user" of S_WAITCNT_soft is all of SIMemoryLegalizer, and every place where `insertWait()` is called, the legalizer is doing so to lower a fence. In all of these places, the more relevant opcode would be S_WAIT_FENCE_soft, since that is what we actually mean. Now if some future pass wanted to really insert a wait count only with no other semantics, it could then use S_WAITCNT_soft because that's what it meant, and not a fence.
A relevant detail is the fact that when lowering a fence, if the legalizer needs to invalidate cache, it currently follows that with a `vmcnt(0)`. One can keep the current behaviour, and also emit S_WAIT_FENCE_soft at the same place. Or one could emit just an S_WAIT_FENCE_soft, and then enhance SIInsertWaitcnts to account for the cache invalidates that need their own wait counts.
https://github.com/llvm/llvm-project/pull/147257
More information about the llvm-commits
mailing list