[PATCH] D18201: Switch over targets to use AtomicExpandPass, and clean up target atomics code.

Wed Mar 16 20:36:00 PDT 2016

jyknight added a comment.

Sorry, yes. The handling of extra-wide loads/stores slipped my mind when writing the comment, although I was aware of it earlier. I'd note also that ARM actually has a similar thing: you need to use ll/sc for wide loads and stores, not normal loads and stores (excepting, of course, 64bit load/stores on CPUs that support LPAE, per the unresolved FIXME note there.)

See shouldExpandAtomicStoreInIR and shouldExpandAtomicLoadInIR for how this is handled in llvm these days (both before and after this patch).

What I **meant** to say is that I believe there's no situation where you should ever see an expansion of an atomic store to a ``__sync_lock_test_and_set_*`` call or an atomic load to ``__sync_val_compare_and_swap_*`` call, which is really all that this code was being used for.

After this change cmpxchg16b does (still) get emitted for 16-byte atomic load/store operations on x86-64, and no libcall is needed...when your CPU supports that instruction.

But, one thing that might affect Darwin, come to think of it. In clang, in http://reviews.llvm.org/D17933, and in LLVM here, we now correctly check if the architecture *actually has* the cx16 instruction when setting the maximum atomic size. It was previously assumed that all x86-64 cpus had it in some code, but not in other code, somewhat inconsistently. Thus, if you compile for an X86 CPU without cmpxchg16b support, you will (now) get ``__atomic_*`` libcalls. That's absolutely the correct behavior.

Whereas, before, it would pretend you always had lock-free 16-byte atomics in clang, and it emitted an llvm 'store atomic' instruction. LLVM's x86 backend would've not expanded to cmpxchg in shouldExpandAtomicStoreInIR, so you'd be left with a 16byte ATOMIC_STORE DAG node. Which would then be lowered to ATOMIC_SWAP and then ``__sync_lock_test_and_set_16`` via the removed code we're talking about. But if you're actually on a CPU without cmpxchg16b, there's no way to actually implement that function according to spec, since you're not supposed to use a mutex.

I don't think Darwin actually supports any pre-cx16 CPUs though, so this really shouldn't affect it. But perhaps somewhere a default CPU architecture needs to be set, if it's not already?

http://reviews.llvm.org/D18201