[llvm-dev] [RFC] Coroutine and pthread_self

Xun Li via llvm-dev llvm-dev at lists.llvm.org
Wed Nov 18 14:06:56 PST 2020


Hi,

I would like to propose a potential solution to a bug that involves
coroutine and pthread_self().

Description of the bug can be found in
https://bugs.llvm.org/show_bug.cgi?id=47833. Below is a summary:
pthread_self() from glibc is defined with "__attribute__
((__const__))". The const attribute tells the compiler that it does
not read nor write any global state and hence always return the same
result. Hence in the following code:

auto x1 = pthread_self();
...
auto x2 = pthread_self();

the second call to pthread_self() can be optimized out. This has been
correct until coroutines. With coroutines, we can have code like this:

auto x1 = pthread_self();
co_await ...
auto x2 = pthread_self();

Now because of the co_await, the function can suspend and resume in a
different thread, in which case the second call to pthread_self()
should return a different result than the first one. Unfortunately
LLVM will still optimize out the second call in the case of
coroutines.

I tried to just nuke all value reuse whenever a coro.suspend is seen
in all CSE-related passes (https://reviews.llvm.org/D89711), but it
doesn't seem scalable and it puts burden on pass writers. So I would
like to propose a new solution.

Proposed Solution:
First of all, we need to update the Clang front-end to special handle
the attributes of pthread_self function: replace the ConstAttr
attribute of pthread_self with a new attribute, say "ThreadConstAttr".
Next, in the emitted IR, functions with "ThreadConstAttr" will have a
new IR attribute, say "thread_readnone".
Finally, there are two possible sub-solutions to handle this new IR attribute:
a) We add a new Pass after CoroSplitPass that changes all the
"thread_readnone" attributes back to "readnone". This will allow it to
work properly prior to CoroSplit, and still provide a chance to do CSE
after CoroSplit. This approach is simplest to implement.
b) We never remove "thread_readnone". However, we teach memory alias
analysis to understand that functions with "thread_readnone" attribute
will only interfere with coro.suspend intrinsics but nothing else.
Hopefully this will still enable CSE. Not sure how feasible this is.

Does the above solution (esp (a)) sound reasonable? Any feedback is
appreciated. Thank you!

A related issue, which may require separate solutions, is that
coroutine also does not work properly with thread local storage. This
is because access to thread local storage in LLVM IR is simply a
reference. However the address to such reference can change after a
coro.suspend. This is not taken care of today.
In this thread I would like to focus on the issue with pthread_self
first, but it's good to have context regarding the thread local
storage issue when discussing solution space.
-- 
Xun


More information about the llvm-dev mailing list