[llvm-branch-commits] [clang] clang/AMDGPU: Emit atomicrmw for __builtin_amdgcn_global_atomic_fadd_{f32|f64} (PR #96872)
Matt Arsenault via llvm-branch-commits
llvm-branch-commits at lists.llvm.org
Thu Jun 27 06:14:19 PDT 2024
================
@@ -49,7 +49,7 @@ void test_s_wait_event_export_ready() {
}
// CHECK-LABEL: @test_global_add_f32
-// CHECK: {{.*}}call{{.*}} float @llvm.amdgcn.global.atomic.fadd.f32.p1.f32(ptr addrspace(1) %{{.*}}, float %{{.*}})
+// CHECK: = atomicrmw fadd ptr addrspace(1) %addr, float %x syncscope("agent") seq_cst, align 4, !amdgpu.no.fine.grained.memory !{{[0-9]+}}, !amdgpu.ignore.denormal.mode !{{[0-9]+$}}
----------------
arsenm wrote:
That's the most conservative option. The current intrinsic handling isn't treated as an atomic at all, and the lowering adds a volatile flag instead. With seq_cst you end up with an additional cache flush compared to the current intrinsic. Release seems to be the strongest ordering that doesn't introduce a new flush after
Running codegen and checking ISA is generally discouraged in clang tests
https://github.com/llvm/llvm-project/pull/96872
More information about the llvm-branch-commits
mailing list