[PATCH] D45321: [atomics] Fix runtime calls for misaligned atomics

Mon Jun 18 11:52:03 PDT 2018

efriedma added a comment.

Did a bit more research into the GNU libatomic behavior.  As far as I can tell, my initial assessment was correct: libatomic never uses unaligned atomic operations.  However, it uses a trick which makes this a little complicated to test: it promotes small atomic operations to pointer size before performing the alignment check.  So, for example, libatomic supports lock-free operations with size=3, depending on the input pointer.  (On x86, it still only promotes up to pointer size, even though larger lock-free atomics are available; not sure why.)

================
Comment at: compiler-rt/lib/builtins/atomic.c:193
+      type tmp = 0;\
+      *((type*)dest) = __c11_atomic_compare_exchange_weak((_Atomic(type)*)src, &tmp, 0, model, model);\
+    }\
----------------
t.p.northover wrote:
> efriedma wrote:
> > efriedma wrote:
> > > Isn't the return value of `__c11_atomic_compare_exchange_weak` the success boolean?
> > > 
> > > Even with that fixed, though, this is UB because you're violating the alignment rules for `_Atomic`.  It might appear to emit the right code for now, but it's a ticking time bomb because the compiler could optimize the "cmpxchg" to a "mov" (since you're not actually modifying the memory on success).
> > > 
> > > The "right" way to do this is to use `__atomic_load` on a pointer to an unligned type.  Granted, clang currently doesn't lower that to the sequence you want (instead it generates a libcall to `__atomic_load_4`).
> > (Or, of course, you could implement this with inline assembly, if you can't get the compiler to emit the sequence you want.)
> > Isn't the return value of __c11_atomic_compare_exchange_weak the success boolean?
> 
> Oops, yes. I'll fix that.
> 
> > Even with that fixed, though, this is UB because you're violating the alignment rules for _Atomic.
> 
> Oh yes, it's definitely really dodgy. As part of the implementation I think compiler-rt gets some kind of latitude to know about Clang's implementation details there though. It's impossible to implement std::vector without UB for example, but libc++ keeps chugging along anyway.
> 
> > It might appear to emit the right code for now, but it's a ticking time bomb because the compiler could optimize the "cmpxchg" to a "mov" (since you're not actually modifying the memory on success).
> 
> I don't think it could (ignoring the whole UB => nasal demons thing). The mov would not do an atomic load, which would render the return value potentially invalid. A cmpxchg still needs to return a valid previous value even if memory is not modified.
> 
> It could potentially modify it to an atomic load and then back into a call to `__atomic_load_4` though (assuming it inserted the correct barriers, which might not be possible for a cmpxchg release).
> 
> > The "right" way to do this is to use `__atomic_load` on a pointer to an unligned type. Granted, clang currently doesn't lower that to the sequence you want (instead it generates a libcall to `__atomic_load_4`).
> 
> Yes. And even if there was a way to get Clang to generate the desired IR LLVM doesn't do the right thing with it either (it calls `__atomic_load`).
> 
> Inline assembly is an option, but even for just x86 it's significantly uglier because of the whole amd64 vs x86, cmpxchg16b thing.
> The mov would not do an atomic load, which would render the return value potentially invalid. A cmpxchg still needs to return a valid previous value even if memory is not modified.

The normal lowering for an atomic load from an aligned pointer on x86 is "mov"; not sure what you're getting at here.

> even for just x86 it's significantly uglier because of the whole amd64 vs x86, cmpxchg16b thing.

x86 doesn't support unaligned 16-byte atomic operations.

Repository:
  rCRT Compiler Runtime

https://reviews.llvm.org/D45321