<div dir="ltr">I don't think it would be a blocker. You can certainly construct a test case where pthread_self is in the critical path (imagine a macro that uses pthread_self to do some sort of TLS access), but it seems unlikely. If coroutines/fibers had always existed and pthread_self was not marked `const`, I don't think we would be spending the time and effort to add a new optimization attribute to optimize it.</div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Fri, Nov 20, 2020 at 10:36 AM Xun Li <<a href="mailto:lxfind@gmail.com">lxfind@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Reid,<br>

<br>

Thanks for the suggestion. That's a good idea.<br>

One concern would be, when this new fiber-safe TLS option is enabled,<br>

pthread_self() will not be optimized even in functions where no<br>

coroutine is used. Do you think that would be a blocker?<br>

<br>

On Fri, Nov 20, 2020 at 7:41 AM Reid Kleckner <<a href="mailto:rnk@google.com" target="_blank">rnk@google.com</a>> wrote:<br>

><br>

> This calls to mind the MSVC /GT option for fiber-safe TLS:<br>

> <a href="https://docs.microsoft.com/en-us/cpp/build/reference/gt-support-fiber-safe-thread-local-storage?view=msvc-160" rel="noreferrer" target="_blank">https://docs.microsoft.com/en-us/cpp/build/reference/gt-support-fiber-safe-thread-local-storage?view=msvc-160</a><br>

> It seems reasonable to implement something similar in LLVM to solve the problem of coroutines and TLS.<br>

><br>

> For pthread_self, instead of inventing a new attribute, would it be enough for clang to ignore the attribute when this new fiber-safe TLS option is enabled?<br>

><br>

> On Wed, Nov 18, 2020 at 2:07 PM Xun Li via llvm-dev <<a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a>> wrote:<br>

>><br>

>> Hi,<br>

>><br>

>> I would like to propose a potential solution to a bug that involves<br>

>> coroutine and pthread_self().<br>

>><br>

>> Description of the bug can be found in<br>

>> <a href="https://bugs.llvm.org/show_bug.cgi?id=47833" rel="noreferrer" target="_blank">https://bugs.llvm.org/show_bug.cgi?id=47833</a>. Below is a summary:<br>

>> pthread_self() from glibc is defined with "__attribute__<br>

>> ((__const__))". The const attribute tells the compiler that it does<br>

>> not read nor write any global state and hence always return the same<br>

>> result. Hence in the following code:<br>

>><br>

>> auto x1 = pthread_self();<br>

>> ...<br>

>> auto x2 = pthread_self();<br>

>><br>

>> the second call to pthread_self() can be optimized out. This has been<br>

>> correct until coroutines. With coroutines, we can have code like this:<br>

>><br>

>> auto x1 = pthread_self();<br>

>> co_await ...<br>

>> auto x2 = pthread_self();<br>

>><br>

>> Now because of the co_await, the function can suspend and resume in a<br>

>> different thread, in which case the second call to pthread_self()<br>

>> should return a different result than the first one. Unfortunately<br>

>> LLVM will still optimize out the second call in the case of<br>

>> coroutines.<br>

>><br>

>> I tried to just nuke all value reuse whenever a coro.suspend is seen<br>

>> in all CSE-related passes (<a href="https://reviews.llvm.org/D89711" rel="noreferrer" target="_blank">https://reviews.llvm.org/D89711</a>), but it<br>

>> doesn't seem scalable and it puts burden on pass writers. So I would<br>

>> like to propose a new solution.<br>

>><br>

>> Proposed Solution:<br>

>> First of all, we need to update the Clang front-end to special handle<br>

>> the attributes of pthread_self function: replace the ConstAttr<br>

>> attribute of pthread_self with a new attribute, say "ThreadConstAttr".<br>

>> Next, in the emitted IR, functions with "ThreadConstAttr" will have a<br>

>> new IR attribute, say "thread_readnone".<br>

>> Finally, there are two possible sub-solutions to handle this new IR attribute:<br>

>> a) We add a new Pass after CoroSplitPass that changes all the<br>

>> "thread_readnone" attributes back to "readnone". This will allow it to<br>

>> work properly prior to CoroSplit, and still provide a chance to do CSE<br>

>> after CoroSplit. This approach is simplest to implement.<br>

>> b) We never remove "thread_readnone". However, we teach memory alias<br>

>> analysis to understand that functions with "thread_readnone" attribute<br>

>> will only interfere with coro.suspend intrinsics but nothing else.<br>

>> Hopefully this will still enable CSE. Not sure how feasible this is.<br>

>><br>

>> Does the above solution (esp (a)) sound reasonable? Any feedback is<br>

>> appreciated. Thank you!<br>

>><br>

>> A related issue, which may require separate solutions, is that<br>

>> coroutine also does not work properly with thread local storage. This<br>

>> is because access to thread local storage in LLVM IR is simply a<br>

>> reference. However the address to such reference can change after a<br>

>> coro.suspend. This is not taken care of today.<br>

>> In this thread I would like to focus on the issue with pthread_self<br>

>> first, but it's good to have context regarding the thread local<br>

>> storage issue when discussing solution space.<br>

>> --<br>

>> Xun<br>

>> _______________________________________________<br>

>> LLVM Developers mailing list<br>

>> <a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a><br>

>> <a href="https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" rel="noreferrer" target="_blank">https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</a><br>

<br>

<br>

<br>

-- <br>

Xun<br>

</blockquote></div>