<div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote">On Wed, Jan 27, 2016 at 11:55 AM, James Knight via cfe-dev <span dir="ltr"><<a href="mailto:cfe-dev@lists.llvm.org" target="_blank">cfe-dev@lists.llvm.org</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div style="word-wrap:break-word"><div style="margin:0px;line-height:normal;color:rgb(35,35,35)">Right now, the atomics implementation in clang is a bit of a mess. It has three basically completely distinct code-paths:</div>
<ol>
<li style="margin:0px;line-height:normal;color:rgb(35,35,35)">There's the legacy __sync_* builtins, which clang lowers directly to atomic IR instructions. Then, the llvm atomic IR instructions themselves can sometimes emit libcalls to __sync_* library functions (which are basically undocumented, and users are often responsible for implementing themselves if they need it).</li>
<li style="margin:0px;line-height:normal;color:rgb(35,35,35)">There's the new __atomic_* builtins, which clang will, depending on size and alignment and target, lower either to a libcall to a "<a href="https://gcc.gnu.org/wiki/Atomic/GCCMM/LIbrary" target="_blank"><span style="color:rgb(18,85,204)">standardized-by-GCC</span></a>" __atomic_* library function (implemented by libatomic), or, to the atomic IR instructions. (Also covered by the same code is the __c11_atomic_* builtins, which are almost-identical clang-specific aliases of the __atomic_* builtins.)</li>
<li style="margin:0px;line-height:normal;color:rgb(35,35,35)">There's the C11 "_Atomic" type modifier and some OpenMP stuff, which clang lowers into atomic IR or libcalls to __atomic_* library functions, via almost completely separate code from #2. (Note: doesn't apply to C++ std::atomic<>, which gets implemented with the builtins instead). Beyond just a lot of duplicated logic in the code, the _Atomic impl is kinda broken: sometimes it emits llvm IR instructions directly (letting things fall back to __sync_* calls), and sometimes it checks if it should emit a libcall first (falling back to __atomic_* calls). That's pretty bad behavior.</li>
</ol><div style="margin:0px;line-height:normal;color:rgb(35,35,35);min-height:14px"><br></div><div style="margin:0px;line-height:normal;color:rgb(35,35,35)">I'd like to make a proposal for cleaning this mess up.</div><div style="margin:0px;line-height:normal;color:rgb(35,35,35);min-height:14px"><br></div><div style="margin:0px;line-height:normal;color:rgb(35,35,35)">BTW, as a sidenote...</div><div style="margin:0px;line-height:normal;color:rgb(35,35,35)">One thing that's important to remember: at least on the C/C++ level, if you support lock-free atomic-ops for a given size/alignment, <i>ALL</i> the atomic ops for that size/alignment must be lock-free. This property is usually quite easy, because you'll have either LL/SC or CAS instructions, which can be used to implement any other operation. <span>However, many older CPU models support atomic load, store, and exchange operations, but are missing any way to do an atomic compare-and-swap. This is true for at least ARMv5, SPARCv8, and Intel 80386.</span><span style="color:rgb(34,34,34)"> </span></div></div></blockquote><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div style="word-wrap:break-word"><div style="margin:0px;line-height:normal;color:rgb(35,35,35)"><span> When your minimum CPU is set to such a cpu model, all atomic operations must be done via libcall -- it's not acceptable for atomic_store to emit an atomic "store" instruction, but atomic_fetch_add to require a libcall which gets implemented with a lock. If that were to happen, then atomic_fetch_add could not actually be made atomic versus a simultaneous atomic_store.</span></div><div style="margin:0px;line-height:normal;color:rgb(35,35,35)"><br></div><div style="margin:0px;line-height:normal;color:rgb(35,35,35);min-height:14px"><br></div><div style="margin:0px;line-height:normal;color:rgb(35,35,35)">So anyhow, there's basically two paths I think we could take for cleanup. I like "Alternative A" below better, but would be interested to hear if others have reasons to think the other would be preferable for some reason.</div><div style="margin:0px;line-height:normal;color:rgb(35,35,35);min-height:14px"><br></div><div style="margin:0px;line-height:normal;color:rgb(35,35,35)"><i>Both</i> alternatives I've suggested will have the effect that the __sync_* builtins in clang will now lower to __atomic_* function calls on platforms without inline atomics (or for unsupported sizes), and C11 atomics will stop being schizophrenic. Clang will cease to ever emit a call to a __sync_* function from any of its builtins or language support. <b>This could theoretically cause an incompatibility on some target. </b></div><div style="margin:0px;line-height:normal;color:rgb(35,35,35);min-height:14px"><br></div><div style="margin:0px;line-height:normal;color:rgb(35,35,35)">However, I think it ought to be pretty safe: I can't imagine anyone who will use an upgraded compiler has __sync_* functions implemented for their platform, but doesn't have the __atomic_* library functions implemented. The __atomic_* builtins (which already use those library calls) are in widespread use, notably, both in libstdc++ and libc++, as well as in a lot of third-party code. IMO it's worth having that potential incompatibility, in order to simplify the situation for users (only one set of atomic library calls to worry about), and to have a single code-path for atomics in the compiler.</div><div style="margin:0px;line-height:normal;color:rgb(35,35,35);min-height:14px"><br></div><div style="margin:0px;line-height:normal;color:rgb(35,35,35);min-height:14px"><br></div><div style="margin:0px;line-height:normal;color:rgb(35,35,35);min-height:14px"><br></div><div style="margin:0px;line-height:normal;color:rgb(35,35,35)"><b>Alternative A</b>: Move all atomic libcall emission into LLVM</div><div style="margin:0px;line-height:normal;color:rgb(35,35,35);min-height:14px"><br></div><div style="margin:0px;line-height:normal;color:rgb(35,35,35)">In this alternative, LLVM will be responsible for lowering all atomic operations (to inline code or libcalls as appropriate) for all three code-paths listed at the beginning. Clang will emit no libcalls for atomic operations itself.</div><div style="margin:0px;line-height:normal;color:rgb(35,35,35);min-height:14px"><br></div><div style="margin:0px;line-height:normal;color:rgb(35,35,35);min-height:14px"><br></div><div style="margin:0px;line-height:normal;color:rgb(35,35,35)">A1) In LLVM: when lowering "store atomic", "load atomic", "atomicrmw", and "cmpxchg" instructions that aren't supported by the target, emit libcalls to the new __atomic_* functions, (rather than the current behavior of calling the legacy __sync_* functions.)</div><div style="margin:0px;line-height:normal;color:rgb(35,35,35);min-height:14px"><br></div><div style="margin:0px;line-height:normal;color:rgb(35,35,35)">Additionally, I'd like to move the decision to emit a libcall into AtomicExpandPass, and remove the ability to use Expand in ISel to create a libcall for ISD::ATOMIC_*. Putting the decision there allows querying the target, up front, for its supported sizes, before any other lowering decisions (either other parts of AtomicExpandPass and in ISel). When choosing whether to inline or libcall, only the size and alignment of the operation should be considered, and not the operation itself, or the type. This will ensure that the "all or none" property is adhered to.</div></div></blockquote><div><br></div><div>This sounds reasonable to me.</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div style="word-wrap:break-word"><div style="margin:0px;line-height:normal;color:rgb(35,35,35);min-height:14px"><br></div><div style="margin:0px;line-height:normal;color:rgb(35,35,35)">(Q: what about the implementation of __atomic_is_lock_free/__atomic_always_lock_free in clang? The clang frontend can't query target information from the llvm backend, can it? Is there some other way to expose the information of what sizes are supported by a backend so that the clang builtins -- the latter of which must be a C++ constant expression -- can also use it? Or...just continue duplicating the info in both places, as is done today...)</div><div style="margin:0px;line-height:normal;color:rgb(35,35,35);min-height:14px"><br></div><div style="margin:0px;line-height:normal;color:rgb(35,35,35)">A2) In clang, start removing the code that knows how to do optimized library call lowering for atomics -- always emit llvm atomic ops. Except for one case: clang will still need to emit library calls itself for data not aligned naturally for its size. The LLVM atomic instructions currently will not handle unaligned data, but unaligned data is allowed for the four "slab of memory" builtins (__atomic_load, __atomic_store, __atomic_compare_exchange, and __atomic_exchange).</div></div></blockquote><div><br></div><div>Hrm, why couldn't we get LLVM to do the heavy lifting for us?</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div style="word-wrap:break-word"><div style="margin:0px;line-height:normal;color:rgb(35,35,35);min-height:14px"><br></div><div style="margin:0px;line-height:normal;color:rgb(35,35,35)"><span>A3) In LLVM, add "align" attributes to cmpxchg and atomicrmw, and allow specifying "align" values for "load atomic" and "store atomic" (where the attribute currently exists but cannot be used). LLVM will then be able to lower misaligned accesses to library calls. In clang, get rid of special case for directly emitting libcalls for misaligned atomics, and lower to llvm instructions there, as well.</span></div></div></blockquote><div><br></div><div>Ah, that answers my question. Great!</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div style="word-wrap:break-word"><div style="margin:0px;line-height:normal;color:rgb(35,35,35);min-height:14px"><br></div><div style="margin:0px;line-height:normal;color:rgb(35,35,35);min-height:14px"><span>A4) Hopefully, now, the three codepaths in clang will be sufficiently the same (as they're all now just creating llvm atomic instructions) that they can be trivially merged, or else they are so trivial that the remaining duplication is not worthwhile to merge.</span></div></div></blockquote><div><br></div><div>SGTM.</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div style="word-wrap:break-word"><div style="margin:0px;line-height:normal;color:rgb(35,35,35);min-height:14px"><div><br></div></div><div style="margin:0px;line-height:normal;color:rgb(35,35,35);min-height:14px"><div style="margin:0px;line-height:normal;min-height:14px"><br></div></div><div style="margin:0px;line-height:normal;color:rgb(35,35,35);min-height:14px"><br></div><div style="margin:0px;line-height:normal;color:rgb(35,35,35);min-height:14px"><br></div><div style="margin:0px;line-height:normal;color:rgb(35,35,35);min-height:14px"><br></div><div style="margin:0px;line-height:normal;color:rgb(35,35,35)"><b>Alternative B</b>: Move all libcall emission for atomics into Clang.</div><div style="margin:0px;line-height:normal;color:rgb(35,35,35);min-height:14px"><br></div><div style="margin:0px;line-height:normal;color:rgb(35,35,35)">In this alternative, LLVM will <i>never</i> emit atomic libcalls from the atomic IR instructions. If the operation requested is not possible on a given target, that is an error. So, the cleanups here are to get clang to stop emitting LLVM IR atomics that cannot be lowered without libcalls.</div><div style="margin:0px;line-height:normal;color:rgb(35,35,35);min-height:14px"><br></div><div style="margin:0px;line-height:normal;color:rgb(35,35,35)">B1) In Clang: have the legacy __sync_* builtins become essentially aliases for the __atomic_* builtins (thus they will emit calls to __atomic_* library functions when applicable).</div><div style="margin:0px;line-height:normal;color:rgb(35,35,35);min-height:14px"><br></div><div style="margin:0px;line-height:normal;color:rgb(35,35,35)">B2) In Clang: Have the C11 _Atomic type support and openmp stuff call into the __atomic builtins' code to do their dirty work, instead of having its own separate implementation.</div><div style="margin:0px;line-height:normal;color:rgb(35,35,35);min-height:14px"><br></div><div style="margin:0px;line-height:normal;color:rgb(35,35,35)">B3) After those two changes, I believe clang will only ever emit atomic IR instructions when they can be lowered. So then, in LLVM: get rid of the fallback to libcalls to __sync_* from the atomic IR instructions. If an atomic IR operation is requested and not implementable lock-free, it will be an error, with no fallback.</div><div><br></div></div><br>_______________________________________________<br>
cfe-dev mailing list<br>
<a href="mailto:cfe-dev@lists.llvm.org">cfe-dev@lists.llvm.org</a><br>
<a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev" rel="noreferrer" target="_blank">http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev</a><br>
<br></blockquote></div><br></div></div>