[llvm-branch-commits] [llvm] [AMDGPU] Fix duplicate s_wait_asynccnt on gfx12-plus (PR #190777)
Sameer Sahasrabuddhe via llvm-branch-commits
llvm-branch-commits at lists.llvm.org
Tue Apr 7 04:27:40 PDT 2026
https://github.com/ssahasra created https://github.com/llvm/llvm-project/pull/190777
S_WAIT_ASYNCCNT was missing from counterTypeForInstr(), so isWaitInstr() did not recognize it as a wait instruction. On the fixpoint algorithm's second pass over a loop body, the already-inserted S_WAIT_ASYNCCNT was treated as a normal instruction, causing WAIT_ASYNCMARK to be re-processed and a duplicate S_WAIT_ASYNCCNT to be emitted.
Assisted-By: Claude Opus 4.6
>From 1bc63af63891c5b8ab365f426b76e59fe3452f11 Mon Sep 17 00:00:00 2001
From: Sameer Sahasrabuddhe <sameer.sahasrabuddhe at amd.com>
Date: Tue, 7 Apr 2026 16:25:41 +0530
Subject: [PATCH] [AMDGPU] Fix duplicate s_wait_asynccnt on gfx12-plus
S_WAIT_ASYNCCNT was missing from counterTypeForInstr(), so
isWaitInstr() did not recognize it as a wait instruction. On the
fixpoint algorithm's second pass over a loop body, the already-inserted
S_WAIT_ASYNCCNT was treated as a normal instruction, causing
WAIT_ASYNCMARK to be re-processed and a duplicate S_WAIT_ASYNCCNT to
be emitted.
Assisted-By: Claude Opus 4.6
---
llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp | 2 ++
llvm/test/CodeGen/AMDGPU/asyncmark-gfx12plus.ll | 1 -
2 files changed, 2 insertions(+), 1 deletion(-)
diff --git a/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp b/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
index eb0b12c3b5bc1..3d6cd101274b2 100644
--- a/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
+++ b/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
@@ -1745,6 +1745,8 @@ static std::optional<InstCounterType> counterTypeForInstr(unsigned Opcode) {
return KM_CNT;
case AMDGPU::S_WAIT_XCNT:
return X_CNT;
+ case AMDGPU::S_WAIT_ASYNCCNT:
+ return ASYNC_CNT;
default:
return {};
}
diff --git a/llvm/test/CodeGen/AMDGPU/asyncmark-gfx12plus.ll b/llvm/test/CodeGen/AMDGPU/asyncmark-gfx12plus.ll
index cfb296fb2d529..18064c6e60a77 100644
--- a/llvm/test/CodeGen/AMDGPU/asyncmark-gfx12plus.ll
+++ b/llvm/test/CodeGen/AMDGPU/asyncmark-gfx12plus.ll
@@ -248,7 +248,6 @@ define amdgpu_kernel void @test_pipelined_loop_with_global(ptr addrspace(1) %foo
; GFX1250-NEXT: ; asyncmark
; GFX1250-NEXT: ; wait_asyncmark(2)
; GFX1250-NEXT: s_wait_asynccnt 0x2
-; GFX1250-NEXT: s_wait_asynccnt 0x2
; GFX1250-NEXT: s_add_co_i32 s8, s8, 1
; GFX1250-NEXT: s_add_co_i32 s9, s9, 4
; GFX1250-NEXT: ds_load_b32 v9, v9
More information about the llvm-branch-commits
mailing list