[PATCH] D67485: AArch64: use ldp/stp for atomic & volatile 128-bit where appropriate.

Fri Sep 20 05:28:37 PDT 2019

labrinea added a comment.

Hi Tim, thanks for looking into this optimization opportunity. I have a few remarks regarding this change:

- First, it appears that the current codegen (CAS loop) for 128-bit atomic accesses is broken based on this comment: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70814#c3. There are two problematic cases as far as I understand: (1) //const// and (2) //volatile// atomic objects. Const objects disallow write access to the underlying memory, volatile objects mandate that each byte of the underlying memory shall be accessed exactly once according to the AAPCS. The CAS loop violates both.

- Maybe the solution is to follow GCC here, at least for the general case (architectures prior to v8.4), meaning to expand atomic operations into a call to the appropriate __atomic library function . I believe this is what the AtomicExpandPass does in LLVM. In that case we need to provide an implementation of those functions.

- My concern about using ldp/stp is that the specification promises single-copy atomicity provided that accesses are to Inner Write-Back, Outer Write-Back Normal cacheable memory. Is there such a guarantee to the compiler? Maybe that's the default for Operating Systems but not for bare-metal? Would it make sense to enable this optimization for certain target triples, i.e. when sys != none?

- In certain code sequences where you have two consecutive atomic stores, or an atomic load followed by an atomic store you'll end up with redundant memory barriers. Is there a way to get rid of them?

- Enabling the corresponding subtarget feature on cyclone doesn't seem safe to me. If we ever implement -mtune, then a command line like `clang -march=armv8a -mtune=cyclone` should mean “generated correct code for the v8.0 architecture, but optimize for cyclone". Adding this feature to cyclone as is would probably result in the above command line producing code that isn’t architecturally correct for v8.0.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D67485/new/

https://reviews.llvm.org/D67485