[llvm-dev] [RFC] Coroutine and pthread_self

Mon Nov 23 15:20:01 PST 2020

James,

I made a partial attempt in https://reviews.llvm.org/D89711, there are
some discussions there.
But it seems that we will have to teach maybe a dozen analyses that
perform some form of CSE on function calls. And any future analyses
will need to be careful about this. Would this approach be too
fragile?

On Mon, Nov 23, 2020 at 2:08 PM James Y Knight <jyknight at google.com> wrote:
>
> Special handling for pthread_self doesn't seem reasonable -- all other functions marked '__attribute__((const))' or 'readnone' have the exact same problem.
>
> As you suggest, the same issue occurs with TLS variables -- in particular, a normal, non-coroutine function which returns the address of a TLS variable (e.g. `__thread int t; int* foo() { return &t; }`) is clearly "readnone", yet, has a different value per thread, and therefore must not be moved across a coroutine suspend point.
>
> I think this means that LLVM just needs to be taught that certain analyses are invalidated by a coroutine suspend point -- or else entirely give up on certain analyses or optimizations on functions which contains any un-lowered coroutine suspend points.
>
>
> On Wed, Nov 18, 2020, 5:07 PM Xun Li via llvm-dev <llvm-dev at lists.llvm.org> wrote:
>>
>> Hi,
>>
>> I would like to propose a potential solution to a bug that involves
>> coroutine and pthread_self().
>>
>> Description of the bug can be found in
>> https://bugs.llvm.org/show_bug.cgi?id=47833. Below is a summary:
>> pthread_self() from glibc is defined with "__attribute__
>> ((__const__))". The const attribute tells the compiler that it does
>> not read nor write any global state and hence always return the same
>> result. Hence in the following code:
>>
>> auto x1 = pthread_self();
>> ...
>> auto x2 = pthread_self();
>>
>> the second call to pthread_self() can be optimized out. This has been
>> correct until coroutines. With coroutines, we can have code like this:
>>
>> auto x1 = pthread_self();
>> co_await ...
>> auto x2 = pthread_self();
>>
>> Now because of the co_await, the function can suspend and resume in a
>> different thread, in which case the second call to pthread_self()
>> should return a different result than the first one. Unfortunately
>> LLVM will still optimize out the second call in the case of
>> coroutines.
>>
>> I tried to just nuke all value reuse whenever a coro.suspend is seen
>> in all CSE-related passes (https://reviews.llvm.org/D89711), but it
>> doesn't seem scalable and it puts burden on pass writers. So I would
>> like to propose a new solution.
>>
>> Proposed Solution:
>> First of all, we need to update the Clang front-end to special handle
>> the attributes of pthread_self function: replace the ConstAttr
>> attribute of pthread_self with a new attribute, say "ThreadConstAttr".
>> Next, in the emitted IR, functions with "ThreadConstAttr" will have a
>> new IR attribute, say "thread_readnone".
>> Finally, there are two possible sub-solutions to handle this new IR attribute:
>> a) We add a new Pass after CoroSplitPass that changes all the
>> "thread_readnone" attributes back to "readnone". This will allow it to
>> work properly prior to CoroSplit, and still provide a chance to do CSE
>> after CoroSplit. This approach is simplest to implement.
>> b) We never remove "thread_readnone". However, we teach memory alias
>> analysis to understand that functions with "thread_readnone" attribute
>> will only interfere with coro.suspend intrinsics but nothing else.
>> Hopefully this will still enable CSE. Not sure how feasible this is.
>>
>> Does the above solution (esp (a)) sound reasonable? Any feedback is
>> appreciated. Thank you!
>>
>> A related issue, which may require separate solutions, is that
>> coroutine also does not work properly with thread local storage. This
>> is because access to thread local storage in LLVM IR is simply a
>> reference. However the address to such reference can change after a
>> coro.suspend. This is not taken care of today.
>> In this thread I would like to focus on the issue with pthread_self
>> first, but it's good to have context regarding the thread local
>> storage issue when discussing solution space.
>> --
>> Xun
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

-- 
Xun