<div dir="ltr">I don't think it would be a blocker. You can certainly construct a test case where pthread_self is in the critical path (imagine a macro that uses pthread_self to do some sort of TLS access), but it seems unlikely. If coroutines/fibers had always existed and pthread_self was not marked `const`, I don't think we would be spending the time and effort to add a new optimization attribute to optimize it.</div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Fri, Nov 20, 2020 at 10:36 AM Xun Li <<a href="mailto:lxfind@gmail.com">lxfind@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Reid,<br>
<br>
Thanks for the suggestion. That's a good idea.<br>
One concern would be, when this new fiber-safe TLS option is enabled,<br>
pthread_self() will not be optimized even in functions where no<br>
coroutine is used. Do you think that would be a blocker?<br>
<br>
On Fri, Nov 20, 2020 at 7:41 AM Reid Kleckner <<a href="mailto:rnk@google.com" target="_blank">rnk@google.com</a>> wrote:<br>
><br>
> This calls to mind the MSVC /GT option for fiber-safe TLS:<br>
> <a href="https://docs.microsoft.com/en-us/cpp/build/reference/gt-support-fiber-safe-thread-local-storage?view=msvc-160" rel="noreferrer" target="_blank">https://docs.microsoft.com/en-us/cpp/build/reference/gt-support-fiber-safe-thread-local-storage?view=msvc-160</a><br>
> It seems reasonable to implement something similar in LLVM to solve the problem of coroutines and TLS.<br>
><br>
> For pthread_self, instead of inventing a new attribute, would it be enough for clang to ignore the attribute when this new fiber-safe TLS option is enabled?<br>
><br>
> On Wed, Nov 18, 2020 at 2:07 PM Xun Li via llvm-dev <<a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a>> wrote:<br>
>><br>
>> Hi,<br>
>><br>
>> I would like to propose a potential solution to a bug that involves<br>
>> coroutine and pthread_self().<br>
>><br>
>> Description of the bug can be found in<br>
>> <a href="https://bugs.llvm.org/show_bug.cgi?id=47833" rel="noreferrer" target="_blank">https://bugs.llvm.org/show_bug.cgi?id=47833</a>. Below is a summary:<br>
>> pthread_self() from glibc is defined with "__attribute__<br>
>> ((__const__))". The const attribute tells the compiler that it does<br>
>> not read nor write any global state and hence always return the same<br>
>> result. Hence in the following code:<br>
>><br>
>> auto x1 = pthread_self();<br>
>> ...<br>
>> auto x2 = pthread_self();<br>
>><br>
>> the second call to pthread_self() can be optimized out. This has been<br>
>> correct until coroutines. With coroutines, we can have code like this:<br>
>><br>
>> auto x1 = pthread_self();<br>
>> co_await ...<br>
>> auto x2 = pthread_self();<br>
>><br>
>> Now because of the co_await, the function can suspend and resume in a<br>
>> different thread, in which case the second call to pthread_self()<br>
>> should return a different result than the first one. Unfortunately<br>
>> LLVM will still optimize out the second call in the case of<br>
>> coroutines.<br>
>><br>
>> I tried to just nuke all value reuse whenever a coro.suspend is seen<br>
>> in all CSE-related passes (<a href="https://reviews.llvm.org/D89711" rel="noreferrer" target="_blank">https://reviews.llvm.org/D89711</a>), but it<br>
>> doesn't seem scalable and it puts burden on pass writers. So I would<br>
>> like to propose a new solution.<br>
>><br>
>> Proposed Solution:<br>
>> First of all, we need to update the Clang front-end to special handle<br>
>> the attributes of pthread_self function: replace the ConstAttr<br>
>> attribute of pthread_self with a new attribute, say "ThreadConstAttr".<br>
>> Next, in the emitted IR, functions with "ThreadConstAttr" will have a<br>
>> new IR attribute, say "thread_readnone".<br>
>> Finally, there are two possible sub-solutions to handle this new IR attribute:<br>
>> a) We add a new Pass after CoroSplitPass that changes all the<br>
>> "thread_readnone" attributes back to "readnone". This will allow it to<br>
>> work properly prior to CoroSplit, and still provide a chance to do CSE<br>
>> after CoroSplit. This approach is simplest to implement.<br>
>> b) We never remove "thread_readnone". However, we teach memory alias<br>
>> analysis to understand that functions with "thread_readnone" attribute<br>
>> will only interfere with coro.suspend intrinsics but nothing else.<br>
>> Hopefully this will still enable CSE. Not sure how feasible this is.<br>
>><br>
>> Does the above solution (esp (a)) sound reasonable? Any feedback is<br>
>> appreciated. Thank you!<br>
>><br>
>> A related issue, which may require separate solutions, is that<br>
>> coroutine also does not work properly with thread local storage. This<br>
>> is because access to thread local storage in LLVM IR is simply a<br>
>> reference. However the address to such reference can change after a<br>
>> coro.suspend. This is not taken care of today.<br>
>> In this thread I would like to focus on the issue with pthread_self<br>
>> first, but it's good to have context regarding the thread local<br>
>> storage issue when discussing solution space.<br>
>> --<br>
>> Xun<br>
>> _______________________________________________<br>
>> LLVM Developers mailing list<br>
>> <a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a><br>
>> <a href="https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" rel="noreferrer" target="_blank">https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</a><br>
<br>
<br>
<br>
-- <br>
Xun<br>
</blockquote></div>