[llvm-branch-commits] [clang] clang/AMDGPU: Emit atomicrmw for __builtin_amdgcn_global_atomic_fadd_{f32|f64} (PR #96872)

Matt Arsenault via llvm-branch-commits llvm-branch-commits at lists.llvm.org
Thu Jun 27 06:14:19 PDT 2024


================
@@ -49,7 +49,7 @@ void test_s_wait_event_export_ready() {
 }
 
 // CHECK-LABEL: @test_global_add_f32
-// CHECK: {{.*}}call{{.*}} float @llvm.amdgcn.global.atomic.fadd.f32.p1.f32(ptr addrspace(1) %{{.*}}, float %{{.*}})
+// CHECK: = atomicrmw fadd ptr addrspace(1) %addr, float %x syncscope("agent") seq_cst, align 4, !amdgpu.no.fine.grained.memory !{{[0-9]+}}, !amdgpu.ignore.denormal.mode !{{[0-9]+$}}
----------------
arsenm wrote:

That's the most conservative option. The current intrinsic handling isn't treated as an atomic at all, and the lowering adds a volatile flag instead. With seq_cst you end up with an additional cache flush compared to the current intrinsic. Release seems to be the strongest ordering that doesn't introduce a new flush after 

Running codegen and checking ISA is generally discouraged in clang tests 

https://github.com/llvm/llvm-project/pull/96872


More information about the llvm-branch-commits mailing list