[llvm] [AMDGPU] Add DS loop waitcnt optimization for GFX12+ (PR #172728)

via llvm-commits llvm-commits at lists.llvm.org
Thu Dec 18 20:40:16 PST 2025


================
@@ -0,0 +1,159 @@
+# RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1250 -run-pass=si-insert-waitcnts -verify-machineinstrs -o - %s | FileCheck %s
+
+# Test 1: Simple case - DS loads only in preheader, no DS loads in loop.
+# The optimization flushes DSCNT in the preheader so that
+# subsequent loop iterations don't need to wait inside the loop.
+
+# CHECK-LABEL: name: ds_preheader_flush_simple
+# CHECK: bb.0:
+# CHECK: DS_READ_B128
+# CHECK: DS_READ_B128
+# CHECK: DS_READ_B128
+# CHECK: DS_READ_B128
+# CHECK: S_WAIT_DSCNT 0
+# CHECK-NEXT: S_BRANCH %bb.1
+# CHECK: bb.1:
+# CHECK-NOT: S_WAIT_DSCNT
+# CHECK: $vgpr30 = V_ADD_F32
+# CHECK-NOT: S_WAIT_DSCNT
+# CHECK: $vgpr31 = V_ADD_F32
+
+--- |
+  define amdgpu_kernel void @ds_preheader_flush_simple() { ret void }
+  define amdgpu_kernel void @ds_loop_prefetch_pattern() { ret void }
+...
----------------
hidekisaito wrote:

Sorry, Cursor added it again here. I need to remember to teach again (and again).

https://github.com/llvm/llvm-project/pull/172728


More information about the llvm-commits mailing list