[llvm-dev] [RFC] Coroutine and pthread_self

Mon Nov 30 10:06:45 PST 2020

To fix TLS variables in coroutines, we need a bit more than the fix
for the pthread_self issue.
For example, assuming we have a TLS variable and a function that looks
like this:

@tls_variable = internal thread_local global i32 0, align 4

define i32* @foo(){
  ret i32* @tls_variable
}

Even if @foo does not have a "readnone" attribute, it will still lead
to issues during inlining in coroutines because @tls_variable is just
a value like any other value. For callers looks like this:

%tls1 = call i32* @foo()
..coro.suspend..
%tls2 = call i32* @foo()
%cond = icmp eq i32* %tls1, %tls2
...

When inlining happens, all uses of %tls1 and %tls2 will simply be
replaced with @tls_variable.

To fix that we might have to introduce a new intrinsics for accessing
TLS variables such that they don't get replaced directly. Such
intrinsics function will be similar to pthread_self: it depends on
thread identity but nothing else.
And then from there we can rely on CSE to properly handle the optimizations.

On Wed, Nov 25, 2020 at 3:50 PM James Y Knight via llvm-dev
<llvm-dev at lists.llvm.org> wrote:
>
>
>
> On Wed, Nov 25, 2020 at 10:13 AM Nicolai Hähnle via llvm-dev <llvm-dev at lists.llvm.org> wrote:
>>
>> On Wed, Nov 25, 2020 at 12:06 AM Joerg Sonnenberger via llvm-dev
>> <llvm-dev at lists.llvm.org> wrote:
>> > On Tue, Nov 24, 2020 at 04:52:32PM -0500, James Y Knight wrote:
>> > > In any case, what it lowers to in Clang is the LLVM IR function-attribute
>> > > readnone, which I'd argue is even more clearly correct to use on an LLVM
>> > > function returning the address of a TLS global variable. Note that llvm
>> > > infers that property via the FunctionAttrs pass, e.g. for this function:
>> > > @t = thread_local global i32 0, align 4
>> > > define i32* @foo() {
>> > > ret i32* @t
>> > > }
>> >
>> > I don't see how that is a valid transformation under the definition in
>> > the language reference either. I also don't believe that this is the
>> > only situation it can happen, I would expect OpenMP to expose similar
>> > issues.
>
>
> Clang/LLVM OpenMP support doesn't expose similar issues, because during the design work in 2012, the explicit choice was made to outline the parallel regions to a separate function up-front. It does not expose an llvm IR intrinsic which switches to another (or multiple!) thread of execution. The ultimately-unsuccessful competing proposals did propose that, and were rejected because of it.
>
> That choice greatly reduced the ability to optimize OpenMP parallel loops (e.g., no LICM, even where it'd be correct!). But, this was done precisely in order to avoid having to teach all the LLVM optimization passes about the rules for optimizing a function body running on multiple threads simultaneously.
>
>>
>> The tricky part is that the "thread self identity" is a piece of state
>> that used to be immutable, but with the introduction of coroutines, it
>> has become mutable. The example @foo reads the "thread self identity"
>> piece of state. As long as that state was immutable, it was correct to
>> mark the function as readnone. Now that the state has become mutable,
>> this is no longer correct.
>>
>> That may seem annoying, but I suspect it's best to just swallow that
>> pill and take it to its logical conclusion.
>>
>> Doing so ends up treating pthread_self(), @foo, etc. more
>> conservatively, but correctly. Treating them correctly and less
>> conservatively is really a kind of alias analysis problem. I would
>> hesitate to just add a new ad-hoc attribute for it, we already have so
>> many of those, and would prefer adding a mechanism for this kind of
>> thing generically.
>
>
> While it is certainly important to correctly model the fiction that the coroutine intrinsics present, I do think it's important to remember that they are a fiction within the compiler. Yes, in the high-level coroutine IR, before CoroSplit runs, thread identity acts as if it were mutable across the "llvm.coro.suspend" intrinsic. However, the suspend intrinsic is simply a placeholder for the transform which will ultimately occur, to turn the entire function body inside-out. This is an important distinction, because there remains no way to "secretly" mutate the thread identity. E.g. calling a function cannot result in the thread identity of the caller being mutated.
>
> That we've introduced this set of coroutine intrinsics (and the CoroSplit pass to flip a function inside out) does not imply that we ought to change the meaning of 'readnone' retroactively to mean that it needs to be invariant on thread-identity. We should continue to model it as meaning "doesn't read mutable memory or mutable state, but may depend on the thread identity", which is what the meaning has been up until now, even if not explicitly stated. Given the existence of an intrinsic which "changes" the thread identity, we should, now, clarify the documentation.
>
> We also do need to fix any optimization passes that run prior to the CoroSplit pass understand the effect that a coroutine suspend-point has on thread-identity. Note that we need to do much of the same work anyways, to fix access to TLS variables within a coroutine. (Or, as an alternative, we could move the CoroSplit pass first in the optimization pipeline, so that there are no such optimization passes which run before it. This would fix the correctness issues, but greatly reduce the potential for optimization -- similar to OpenMP)
>
> Finally, it turns out to be important, it may be reasonable to add a new function attribute to indicate "doesn't depend on current thread identity", which could unlock additional optimization potential even across a suspend-point. But, I would not suggest actually adding it, unless a real need has been identified and justified.
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

-- 
Xun