[cfe-dev] Why does atomic generate IR calls when sync generates atomicrmw?

Mon Nov 23 13:37:02 PST 2015

I have noticed that, on my target, __sync_fetch_and_add causes clang to 
generate the following...
     %0 = atomicrmw add i32* %val, i32 1 seq_cst

... but __atomic_fetch_add with __ATOMIC_SEQ_CST causes the following to 
get emitted...
     %1 = bitcast i32* %val to i8*
     %call = call i32 @__atomic_fetch_add_4(i8* %1, i32 1, i32 5) #1

Now, I am aware that I need to tweak my version of 
TargetInfo::MaxAtomicInlineWidth, MaxAtomicPromoteWidth, and 
hasBuiltinAtomic() in order to get the atomicrmw IR instruction I want.  
What I'm not sure of is why.

Why is a function call generated in clang?  Why don't we let LLVM choose 
whether to emit a function call or inline assembly in these cases?

Another reason I care about this is because some atomic operations 
aren't directly supported by my hardware ( 4 byte and 8 byte atomics are 
directly supported, 1 byte and 2 byte atomics are not).  I can emulate 
them, but I would like to know where the function call should be 
emitted.  Suppose I lie to clang and tell it to inline my 1 byte and 2 
byte atomics, and let LLVM generate the library call. Are there 
significant downsides to this approach (like losing the memory model 
information), or do I need to implement separate library calls for the 
__sync instructions and __atomic instructions?

-- 
Employee of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project

[cfe-dev] Why does __atomic generate IR calls when __sync generates atomicrmw?

[cfe-dev] Why does atomic generate IR calls when sync generates atomicrmw?