[PATCH] D67485: AArch64: use ldp/stp for atomic & volatile 128-bit where appropriate.

Mon Sep 23 09:29:18 PDT 2019

labrinea added a comment.

In D67485#1676748 <https://reviews.llvm.org/D67485#1676748>, @t.p.northover wrote:

> I think Clang is involved there too, in horribly non-obvious ways (for example I think that's the only way to get the actual libcalls you want rather than legacy ones). Either way, that's a change that would need pretty careful coordination. Since all of our CPUs are Cyclone or above we could probably just skip the libcalls entirely at Apple without ABI breakage (which, unintentionally, is what this patch does).

I am not sure I am following here. According to https://llvm.org/docs/Atomics.html the AtomicExpandPass will translate atomic operations on data sizes above MaxAtomicSizeInBitsSupported into calls to atomic libcalls. The docs say that even though the libcalls share the same names with clang builtins they are not directly related to them. Indeed, I hacked the AArhc64 backend to disallow codegen for 128-bit atomics and as a result LLVM emitted calls to `__atomic_store_16` and `__atomic_load_16`. Are those legacy names? I also tried emitting IR for the clang builtins and I saw atomic load/store IR instructions (like those in your tests), no libcalls. Anyhow, my concern here is that if sometime in the future we replace the broken CAS loop with a libcall, the current patch will break ABI compatibity between v8.4 objects with atomic ldp/stp and v8.X objects without the extension. Moreover, this ABI incompatibility already exists between objects built with LLVM and GCC. Any thoughts?

> I don't think anyone has written down a guarantee, but we've pretty much always assumed we're accessing reasonably normal memory. `dmb` instructions are always `ish` for example. I've never had any comments from our more embedded developers on that front (or seen anyone try to do general atomics in another realm). I suspect they go to assembly for the few spots it might matter.

Fair enough.

> ARM has a pass designed to merge adjacent barriers, though I've seen it miss some cases. We might think about porting it to AArch64, or maybe doing some work in AtomicExpansion in generic way.

Sounds good. On that note can you add some testcases (like the sequences I mentioned) with a FIXME label as motivating examples of such an optimization for future work?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D67485/new/

https://reviews.llvm.org/D67485