<html xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<meta name="Generator" content="Microsoft Word 15 (filtered medium)">
<style><!--
/* Font Definitions */
@font-face
{font-family:"Cambria Math";
panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0in;
margin-bottom:.0001pt;
font-size:12.0pt;
font-family:"Calibri",sans-serif;}
a:link, span.MsoHyperlink
{mso-style-priority:99;
color:#0563C1;
text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
{mso-style-priority:99;
color:#954F72;
text-decoration:underline;}
span.EmailStyle17
{mso-style-type:personal-compose;
font-family:"Calibri",sans-serif;
color:windowtext;}
.MsoChpDefault
{mso-style-type:export-only;
font-family:"Calibri",sans-serif;}
@page WordSection1
{size:8.5in 11.0in;
margin:1.0in 1.0in 1.0in 1.0in;}
div.WordSection1
{page:WordSection1;}
--></style>
</head>
<body lang="EN-US" link="#0563C1" vlink="#954F72">
<div class="WordSection1">
<p class="MsoNormal"><span style="font-size:11.0pt">Hi,<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">I think this more of a front-end issue than a compile-rt issue, but I’m also copying the llvm-dev list<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">In compiler-rt the file lib/builtins/atomic.c seems to rely on determining at compile time if an atomic operation of size 1, 2, 4, or 8 is always lock free.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">For example, in the implementation of __atomic_load_8() we have something like this after macro expansion:<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">void __atomic_load_8(…)<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">{<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"> If (__c11_atomic_is_lock_free(8))<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"> return __c11_atomic_load_8(..)<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">/* otherwise lock-based implementation */<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">}<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">Let’s say a target supports lock-free atomics for 8-byte objects. Then the front-end will lower __c11_atomic_is_lock_free(8) to “true” in clang/lib/AST/ExprConstant.cpp VisitBuiltinCallExpr() and further
__c11_atomic_load_8() will expand into an inlined sequence of instructions. Therefore, we get what we want which is an optimized implementation of __atomic_load_8().<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">But, let’s say the target is an ARM variant that does not support atomics for 8-byte objects. The front-end will lower __c11_atomic_is_lock_free(8) to __atomic_is_lock_free(8) (note the “is” vs “always”)
which is not lowered to “false” but instead remains as a function call. __c11_atomic_load_8() is then lowered to a function call __atomic_load_8() since we are not inlining atomic operations on 8-byte objects. The result is code like this:<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">void __atomic_load_8(…)<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">{<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"> If (__atomic_is_lock_free(8)). /* should always return false, And if it did return true we’d have an infinite recursive call. */<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"> return __atomic_load_8(..)<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">/* otherwise lock-based implementation */<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">}<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">That is fine if an __atomic_is_lock_free() is provided and it always returns false for 8 (BTW __atomic_is_lock_free is not implemented in compiler-rt), but this wastes code size and is inefficient.
<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">Should atomic.c be calling __atomic_always_lock_free instead of __c11_atomic_is_lock_free to check for lock-free cases? The “always” flavor is determined at compile time.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">OR should the front-end always lower __c11_atomic_is_lock_free() to true or false for atomics on objects of size 1, 2, 4, or 8 bytes?<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">AND /OR should some sort of new TargetInfo routine be added to assert that certain sized operations are NEVER lock free?<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">There is a comment in the code in VisitBuiltinCallExpr() that seems to be biasing the implementation towards some X86-64 processor assumptions:<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"> // For __atomic_is_lock_free(sizeof(_Atomic(T))), if the size is a power<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"> // of two less than the maximum inline atomic width, we know it is<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"> // lock-free. If the size isn't a power of two, or greater than the<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"> // maximum alignment where we promote atomics, we know it is not lock-free<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"> // (at least not in the sense of atomic_is_lock_free). Otherwise,<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"> // the answer can only be determined at runtime; for example, 16-byte<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"> // atomics have lock-free implementations on some, but not all,<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"> // x86-64 processors.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">BTW, I think that should be: “If the size isn’t a power of two or IS greater than the maximum…”.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">Comments?<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">Beast Regards, <o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">Eric Stotzer <o:p></o:p></span></p>
</div>
</body>
</html>