[llvm] ef7de8d - [AMDGPU] Remove scope check in SIInsertWaitcnts::generateWaitcntInstBefore (#157821)

via llvm-commits llvm-commits at lists.llvm.org
Fri Sep 12 11:51:42 PDT 2025


Author: choikwa
Date: 2025-09-12T14:51:36-04:00
New Revision: ef7de8d1447c822dec72d685d85053216936b895

URL: https://github.com/llvm/llvm-project/commit/ef7de8d1447c822dec72d685d85053216936b895
DIFF: https://github.com/llvm/llvm-project/commit/ef7de8d1447c822dec72d685d85053216936b895.diff

LOG: [AMDGPU] Remove scope check in SIInsertWaitcnts::generateWaitcntInstBefore (#157821)

This change was motivated by CK where many VMCNT(0)'s were generated due
to instructions lacking !alias.scope metadata. The two causes of this
were:
1) LowerLDSModule not tacking on scope metadata on a single LDS variable
2) IPSCCP pass before inliner replacing noalias ptr derivative with a
global value, which made inliner unable to track it back to the noalias
   ptr argument.

However, it turns out that IPSCCP losing the scope information was
largely ineffectual as ScopedNoAliasAA was able to handle asymmetric
condition, where one MemLoc was missing scope, and still return NoAlias
result.

AMDGPU however was checking for existence of scope in SIInsertWaitcnts
and conservatively treating it as aliasing all and inserted VMCNT(0)
before DS_READs, forcing it to wait for all previous LDS DMA
instructions.

Since we know that ScopedNoAliasAA can handle asymmetry, we should also
allow AA query to determine if two MIs may alias.

Passed PSDB.

Previous attempt to address the issue in IPSCCP, likely stalled:
https://github.com/llvm/llvm-project/pull/154522
This solution may be preferrable over that as issue only affects AMDGPU.

Added: 
    

Modified: 
    llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
    llvm/test/CodeGen/AMDGPU/waitcnt-unscoped.ll

Removed: 
    


################################################################################
diff  --git a/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp b/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
index b163a274396ff..ae75fb529dade 100644
--- a/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
+++ b/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
@@ -1941,13 +1941,7 @@ bool SIInsertWaitcnts::generateWaitcntInstBefore(MachineInstr &MI,
 
         // LOAD_CNT is only relevant to vgpr or LDS.
         unsigned RegNo = FIRST_LDS_VGPR;
-        // Only objects with alias scope info were added to LDSDMAScopes array.
-        // In the absense of the scope info we will not be able to disambiguate
-        // aliasing here. There is no need to try searching for a corresponding
-        // store slot. This is conservatively correct because in that case we
-        // will produce a wait using the first (general) LDS DMA wait slot which
-        // will wait on all of them anyway.
-        if (Ptr && Memop->getAAInfo() && Memop->getAAInfo().Scope) {
+        if (Ptr && Memop->getAAInfo()) {
           const auto &LDSDMAStores = ScoreBrackets.getLDSDMAStores();
           for (unsigned I = 0, E = LDSDMAStores.size(); I != E; ++I) {
             if (MI.mayAlias(AA, *LDSDMAStores[I], true))

diff  --git a/llvm/test/CodeGen/AMDGPU/waitcnt-unscoped.ll b/llvm/test/CodeGen/AMDGPU/waitcnt-unscoped.ll
index 0bd8667d17e52..a00aca34252b1 100644
--- a/llvm/test/CodeGen/AMDGPU/waitcnt-unscoped.ll
+++ b/llvm/test/CodeGen/AMDGPU/waitcnt-unscoped.ll
@@ -26,7 +26,6 @@ define amdgpu_kernel void @test_waitcnt(ptr addrspace(1) %global_buffer, ptr add
 ; CHECK-NEXT:    ds_write_b32 v1, v3
 ; CHECK-NEXT:    ds_write_b32 v2, v3
 ; CHECK-NEXT:    ; sched_barrier mask(0x00000000)
-; CHECK-NEXT:    s_waitcnt vmcnt(0)
 ; CHECK-NEXT:    ds_read_b32 v1, v1
 ; CHECK-NEXT:    s_waitcnt lgkmcnt(0)
 ; CHECK-NEXT:    global_store_dword v0, v1, s[0:1] offset:16


        


More information about the llvm-commits mailing list