[llvm] [AMDGPU] Add dynamic threshold for DPP atomic optimizer on integer LDS atomics (PR #186762)

Mon Mar 30 05:08:18 PDT 2026

================
@@ -0,0 +1,1528 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
+;
+; Test the -amdgpu-atomic-optimizer-dpp-lds-threshold option which controls
+; dynamic DPP vs no-opt branching for integer LDS atomics.
+;
+; Threshold=5: use DPP only when active lanes > 5, otherwise each lane does
+;   its own atomic.
+; Threshold=32: on wave32 this disables DPP entirely (>= wavefront size).
+; Threshold=64: on wave64 this disables DPP entirely (>= wavefront size).
+
+; --- Threshold=5 tests (dynamic branch expected) ---
+; RUN: llc -mtriple=amdgcn -mcpu=gfx1010 -mattr=+wavefrontsize32 -mattr=-flat-for-global -amdgpu-atomic-optimizer-strategy=DPP -amdgpu-atomic-optimizer-dpp-lds-threshold=5 < %s | FileCheck -enable-var-scope -check-prefixes=GFX1032-THRESH5 %s
+; RUN: llc -mtriple=amdgcn -mcpu=gfx1010 -mattr=+wavefrontsize64 -mattr=-flat-for-global -amdgpu-atomic-optimizer-strategy=DPP -amdgpu-atomic-optimizer-dpp-lds-threshold=5 < %s | FileCheck -enable-var-scope -check-prefixes=GFX1064-THRESH5 %s
+; RUN: llc -mtriple=amdgcn -mcpu=gfx900 -mattr=-flat-for-global -amdgpu-atomic-optimizer-strategy=DPP -amdgpu-atomic-optimizer-dpp-lds-threshold=5 < %s | FileCheck -enable-var-scope -check-prefixes=GFX900-THRESH5 %s
+
+; --- Threshold >= wavefront size tests (DPP fully disabled for int LDS) ---
+; RUN: llc -mtriple=amdgcn -mcpu=gfx1010 -mattr=+wavefrontsize32 -mattr=-flat-for-global -amdgpu-atomic-optimizer-strategy=DPP -amdgpu-atomic-optimizer-dpp-lds-threshold=32 < %s | FileCheck -enable-var-scope -check-prefixes=GFX1032-DISABLED %s
+; RUN: llc -mtriple=amdgcn -mcpu=gfx1010 -mattr=+wavefrontsize64 -mattr=-flat-for-global -amdgpu-atomic-optimizer-strategy=DPP -amdgpu-atomic-optimizer-dpp-lds-threshold=64 < %s | FileCheck -enable-var-scope -check-prefixes=GFX1064-DISABLED %s
+
+declare i32 @llvm.amdgcn.workitem.id.x()
+
+ at local_var32 = addrspace(3) global i32 undef, align 4
+
+; Test 1: divergent i32 add with result used -- dynamic threshold branch expected
+define amdgpu_kernel void @add_i32_varying(ptr addrspace(1) %out) {
+; GFX1032-THRESH5-LABEL: add_i32_varying:
+; GFX1032-THRESH5:       ; %bb.0: ; %entry
+; GFX1032-THRESH5-NEXT:    s_bcnt1_i32_b32 s0, exec_lo
+; GFX1032-THRESH5-NEXT:    s_cmp_lt_u32 s0, 6
+; GFX1032-THRESH5-NEXT:    s_cbranch_scc0 .LBB0_2
+; GFX1032-THRESH5-NEXT:  ; %bb.1: ; %atomicrmw.no_opt
+; GFX1032-THRESH5-NEXT:    v_mov_b32_e32 v4, 0
+; GFX1032-THRESH5-NEXT:    ds_add_rtn_u32 v4, v4, v0
+; GFX1032-THRESH5-NEXT:    s_waitcnt lgkmcnt(0)
+; GFX1032-THRESH5-NEXT:    buffer_gl0_inv
+; GFX1032-THRESH5-NEXT:    s_cbranch_execz .LBB0_3
----------------
perlfu wrote:

Thank you for investigating the oddity. I didn't think it was a bug with this PR and shouldn't block it, but did seem odd.

https://github.com/llvm/llvm-project/pull/186762