[llvm-branch-commits] [clang] clang/AMDGPU: Emit atomicrmw for __builtin_amdgcn_global_atomic_fadd_{f32|f64} (PR #96872)
Pierre van Houtryve via llvm-branch-commits
llvm-branch-commits at lists.llvm.org
Thu Aug 1 23:59:13 PDT 2024
================
@@ -19273,9 +19269,14 @@ Value *CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned BuiltinID,
ProcessOrderScopeAMDGCN(EmitScalarExpr(E->getArg(2)),
EmitScalarExpr(E->getArg(3)), AO, SSID);
} else {
- // The ds_atomic_fadd_* builtins do not have syncscope/order arguments.
- SSID = llvm::SyncScope::System;
- AO = AtomicOrdering::SequentiallyConsistent;
+ // Most of the builtins do not have syncscope/order arguments. For DS
+ // atomics the scope doesn't really matter, as they implicitly operate at
+ // workgroup scope.
+ //
+ // The global/flat cases need to use agent scope to consistently produce
+ // the native instruction instead of a cmpxchg expansion.
+ SSID = getLLVMContext().getOrInsertSyncScopeID("agent");
----------------
Pierre-vh wrote:
What happens with system (the default) ? I'm not sure I like using `agent` just to force the right expansion when there is no memory model motivation behind it. Do we have a precedent for this kind of thing?
Could codegen be fixed so you can just use `system`?
https://github.com/llvm/llvm-project/pull/96872
More information about the llvm-branch-commits
mailing list