[cfe-dev] Handling of __c11_atomic_is_lock_free({1, 2, 4, 8}) in compiler-rt atomic.c
Stotzer, Eric via cfe-dev
cfe-dev at lists.llvm.org
Tue Mar 12 12:08:38 PDT 2019
Hi,
I think this more of a front-end issue than a compile-rt issue, but I’m also copying the llvm-dev list
In compiler-rt the file lib/builtins/atomic.c seems to rely on determining at compile time if an atomic operation of size 1, 2, 4, or 8 is always lock free.
For example, in the implementation of __atomic_load_8() we have something like this after macro expansion:
void __atomic_load_8(…)
{
If (__c11_atomic_is_lock_free(8))
return __c11_atomic_load_8(..)
/* otherwise lock-based implementation */
}
Let’s say a target supports lock-free atomics for 8-byte objects. Then the front-end will lower __c11_atomic_is_lock_free(8) to “true” in clang/lib/AST/ExprConstant.cpp VisitBuiltinCallExpr() and further __c11_atomic_load_8() will expand into an inlined sequence of instructions. Therefore, we get what we want which is an optimized implementation of __atomic_load_8().
But, let’s say the target is an ARM variant that does not support atomics for 8-byte objects. The front-end will lower __c11_atomic_is_lock_free(8) to __atomic_is_lock_free(8) (note the “is” vs “always”) which is not lowered to “false” but instead remains as a function call. __c11_atomic_load_8() is then lowered to a function call __atomic_load_8() since we are not inlining atomic operations on 8-byte objects. The result is code like this:
void __atomic_load_8(…)
{
If (__atomic_is_lock_free(8)). /* should always return false, And if it did return true we’d have an infinite recursive call. */
return __atomic_load_8(..)
/* otherwise lock-based implementation */
}
That is fine if an __atomic_is_lock_free() is provided and it always returns false for 8 (BTW __atomic_is_lock_free is not implemented in compiler-rt), but this wastes code size and is inefficient.
Should atomic.c be calling __atomic_always_lock_free instead of __c11_atomic_is_lock_free to check for lock-free cases? The “always” flavor is determined at compile time.
OR should the front-end always lower __c11_atomic_is_lock_free() to true or false for atomics on objects of size 1, 2, 4, or 8 bytes?
AND /OR should some sort of new TargetInfo routine be added to assert that certain sized operations are NEVER lock free?
There is a comment in the code in VisitBuiltinCallExpr() that seems to be biasing the implementation towards some X86-64 processor assumptions:
// For __atomic_is_lock_free(sizeof(_Atomic(T))), if the size is a power
// of two less than the maximum inline atomic width, we know it is
// lock-free. If the size isn't a power of two, or greater than the
// maximum alignment where we promote atomics, we know it is not lock-free
// (at least not in the sense of atomic_is_lock_free). Otherwise,
// the answer can only be determined at runtime; for example, 16-byte
// atomics have lock-free implementations on some, but not all,
// x86-64 processors.
BTW, I think that should be: “If the size isn’t a power of two or IS greater than the maximum…”.
Comments?
Beast Regards,
Eric Stotzer
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20190312/3b8bf542/attachment.html>
More information about the cfe-dev
mailing list