[cfe-dev] Handling of __c11_atomic_is_lock_free({1, 2, 4, 8}) in compiler-rt atomic.c

Tue Mar 12 12:08:38 PDT 2019

Hi,

I think this more of a front-end issue than a compile-rt issue, but I’m also copying the llvm-dev list

In compiler-rt the file lib/builtins/atomic.c seems to rely on determining at compile time if an atomic operation of size 1, 2, 4, or 8 is always lock free.

For example, in the implementation of __atomic_load_8() we have something like this after macro expansion:

void __atomic_load_8(…)
{
  If (__c11_atomic_is_lock_free(8))
    return __c11_atomic_load_8(..)
/* otherwise lock-based implementation */
}

Let’s say a target supports lock-free atomics for 8-byte objects.  Then the front-end will lower __c11_atomic_is_lock_free(8) to “true” in clang/lib/AST/ExprConstant.cpp VisitBuiltinCallExpr() and further __c11_atomic_load_8() will expand into an inlined sequence of instructions.  Therefore, we get what we want which is an optimized implementation of __atomic_load_8().

But, let’s say the target is an ARM variant that does not support atomics for 8-byte objects.  The front-end will lower __c11_atomic_is_lock_free(8) to __atomic_is_lock_free(8) (note the “is” vs “always”) which is not lowered to “false” but instead remains as a function call.  __c11_atomic_load_8() is then lowered to a function call __atomic_load_8() since we are not inlining atomic operations on 8-byte objects.  The result is code like this:

void __atomic_load_8(…)
{
  If (__atomic_is_lock_free(8)). /* should always return false, And if it did return true we’d have an infinite recursive call. */
    return __atomic_load_8(..)
/* otherwise lock-based implementation */
}

That is fine if an __atomic_is_lock_free() is provided and it always returns false for 8 (BTW __atomic_is_lock_free is not implemented in compiler-rt), but this wastes code size and is inefficient.

Should atomic.c be calling __atomic_always_lock_free instead of __c11_atomic_is_lock_free to check for lock-free cases?  The “always” flavor is determined at compile time.

OR should the front-end always lower __c11_atomic_is_lock_free() to true or false for atomics on objects of size 1, 2, 4, or 8 bytes?

AND /OR should some sort of new TargetInfo routine be added to assert that certain sized operations are NEVER lock free?

There is a comment in the code in VisitBuiltinCallExpr() that seems to be biasing the implementation towards some X86-64 processor assumptions:

    // For __atomic_is_lock_free(sizeof(_Atomic(T))), if the size is a power
    // of two less than the maximum inline atomic width, we know it is
    // lock-free.  If the size isn't a power of two, or greater than the
    // maximum alignment where we promote atomics, we know it is not lock-free
    // (at least not in the sense of atomic_is_lock_free).  Otherwise,
    // the answer can only be determined at runtime; for example, 16-byte
    // atomics have lock-free implementations on some, but not all,
    // x86-64 processors.

BTW, I think that should be: “If the size isn’t a power of two or IS greater than the maximum…”.

Comments?

Beast Regards,
Eric Stotzer
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20190312/3b8bf542/attachment.html>