[PATCH] D45321: [atomics] Fix runtime calls for misaligned atomics

Mon Jun 18 05:06:07 PDT 2018

t.p.northover added a comment.

Sorry I take so long replying to this each time. I really need to plan my time better.

================
Comment at: compiler-rt/lib/builtins/atomic.c:193
+      type tmp = 0;\
+      *((type*)dest) = __c11_atomic_compare_exchange_weak((_Atomic(type)*)src, &tmp, 0, model, model);\
+    }\
----------------
efriedma wrote:
> efriedma wrote:
> > Isn't the return value of `__c11_atomic_compare_exchange_weak` the success boolean?
> > 
> > Even with that fixed, though, this is UB because you're violating the alignment rules for `_Atomic`.  It might appear to emit the right code for now, but it's a ticking time bomb because the compiler could optimize the "cmpxchg" to a "mov" (since you're not actually modifying the memory on success).
> > 
> > The "right" way to do this is to use `__atomic_load` on a pointer to an unligned type.  Granted, clang currently doesn't lower that to the sequence you want (instead it generates a libcall to `__atomic_load_4`).
> (Or, of course, you could implement this with inline assembly, if you can't get the compiler to emit the sequence you want.)
> Isn't the return value of __c11_atomic_compare_exchange_weak the success boolean?

Oops, yes. I'll fix that.

> Even with that fixed, though, this is UB because you're violating the alignment rules for _Atomic.

Oh yes, it's definitely really dodgy. As part of the implementation I think compiler-rt gets some kind of latitude to know about Clang's implementation details there though. It's impossible to implement std::vector without UB for example, but libc++ keeps chugging along anyway.

> It might appear to emit the right code for now, but it's a ticking time bomb because the compiler could optimize the "cmpxchg" to a "mov" (since you're not actually modifying the memory on success).

I don't think it could (ignoring the whole UB => nasal demons thing). The mov would not do an atomic load, which would render the return value potentially invalid. A cmpxchg still needs to return a valid previous value even if memory is not modified.

It could potentially modify it to an atomic load and then back into a call to `__atomic_load_4` though (assuming it inserted the correct barriers, which might not be possible for a cmpxchg release).

> The "right" way to do this is to use `__atomic_load` on a pointer to an unligned type. Granted, clang currently doesn't lower that to the sequence you want (instead it generates a libcall to `__atomic_load_4`).

Yes. And even if there was a way to get Clang to generate the desired IR LLVM doesn't do the right thing with it either (it calls `__atomic_load`).

Inline assembly is an option, but even for just x86 it's significantly uglier because of the whole amd64 vs x86, cmpxchg16b thing.

Repository:
  rCRT Compiler Runtime

https://reviews.llvm.org/D45321