[cfe-dev] Why does __atomic generate IR calls when __sync generates atomicrmw?
Craig, Ben via cfe-dev
cfe-dev at lists.llvm.org
Mon Nov 23 13:37:02 PST 2015
I have noticed that, on my target, __sync_fetch_and_add causes clang to
generate the following...
%0 = atomicrmw add i32* %val, i32 1 seq_cst
... but __atomic_fetch_add with __ATOMIC_SEQ_CST causes the following to
get emitted...
%1 = bitcast i32* %val to i8*
%call = call i32 @__atomic_fetch_add_4(i8* %1, i32 1, i32 5) #1
Now, I am aware that I need to tweak my version of
TargetInfo::MaxAtomicInlineWidth, MaxAtomicPromoteWidth, and
hasBuiltinAtomic() in order to get the atomicrmw IR instruction I want.
What I'm not sure of is why.
Why is a function call generated in clang? Why don't we let LLVM choose
whether to emit a function call or inline assembly in these cases?
Another reason I care about this is because some atomic operations
aren't directly supported by my hardware ( 4 byte and 8 byte atomics are
directly supported, 1 byte and 2 byte atomics are not). I can emulate
them, but I would like to know where the function call should be
emitted. Suppose I lie to clang and tell it to inline my 1 byte and 2
byte atomics, and let LLVM generate the library call. Are there
significant downsides to this approach (like losing the memory model
information), or do I need to implement separate library calls for the
__sync instructions and __atomic instructions?
--
Employee of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project
More information about the cfe-dev
mailing list