[PATCH] D89711: [Coroutine] Prevent value reusing across coroutine suspensions in EarlyCSE and GVN

Tue Oct 20 12:29:24 PDT 2020

lxfind added a comment.

In D89711#2339528 <https://reviews.llvm.org/D89711#2339528>, @efriedma wrote:

>> but that would also mean we would never be able to optimize out redundant pthread_self() calls
>
> We can probably mess with alias analysis so it understands that pthread_self doesn't alias operations other than calls to a coroutine suspend; that should be enough to recover the relevant optimizations.  Not sure if we want to add some sort of IR attribute, or just special-case that specific library call using TargetLibraryInfo.
>
>> And yes thread local variables are another set of problems. I don't have a solution yet on how to handle them.
>
> We probably need an intrinsic that computes the runtime address of a thread-local variable, so we compute the address at some specific point in the function.

After thinking about it more:
First of all, we cannot drop the readnone tag in the definition of pthread_self in Clang, the regression in the non-coroutine cases are likely unacceptable and they should pay for it if not using coroutines.
Secondly, because one can call pthread_self through indirect function calls, hence just checking for pthread_self in coroutines is not sufficient. Instead, in a coroutine function, we never want to reuse the results of function calls.
There doesn't seem to be a way to tag a callsite that it might access memory (except through operand bundles, which doesn't seem to fit here), so it seems to me there are only two possible solutions:

1. Rewrite Clang frontend for coroutine so that it directly emit multiple functions for each suspension region. It eliminates the problem but then optimizing across multiple functions that in fact belong to one will be quite challenging and the change will be very significant.
2. In all the relevant passes that would reuse call results (EarlyCSE and GVN as far as I know, but do let me know if there are others), do not reuse call results within a coroutine.

Since the first solution is way to heavy and has a lot of downsides, the second solution seems the way to go. It will be basically along the shape of this patch, but limit the damage to only call result sharing, not other expressions. What do you think?

In D89711#2342389 <https://reviews.llvm.org/D89711#2342389>, @efriedma wrote:

>> First of all, we cannot drop the readnone tag in the definition of pthread_self in Clang, the regression in the non-coroutine cases are likely unacceptable and they shouldn't pay for it if not using coroutines.
>
> Like I noted before, we can recover the optimization power by special-casing it in alias analysis.  For almost all optimizations, it doesn't matter that it reads memory if that read doesn't actually alias anything.
>
>> Secondly, because one can call pthread_self through indirect function calls, hence just checking for pthread_self in coroutines is not sufficient. Instead, in a coroutine function, we never want to reuse the results of function calls.
>
> This depends on what we decide the "const" attribute actually means, in the context of coroutines.  We could decide that const actually means the value can't depend on the thread ID, then treat fixing up the libc header as a compatibility hack.  Or we could decide that "const" is actually allowed to change across a coroutine suspend, in which case we need to do something more invasive.
>
> Either way, I'd still prefer that the IR readnone exclude functions that behave like pthread_self, I think.
>
>> In all the relevant passes that would reuse call results (EarlyCSE and GVN as far as I know, but do let me know if there are others), do not reuse call results within a coroutine.
>
> We also have a few other GVN-ish passes: NewGVN, GVNHoist and GVNSink. SimplifyCFG does a form of CSE, although I'm not sure it actually affects this case.  LICM also reuses call results in a sense that's relevant here.
>
> Even if we do catch all the cases that are relevant right now, having an "extra" form of memory access that isn't visible to alias analysis makes understanding existing code, and writing new code, harder.
>
> -------
>
> I think this discussion is getting to the point where it would benefit from a wider audience on llvm-dev.

Thanks. I think I get it now. I will write up something in more details and post in llvm-dev.
To summarize:

1. In Clang we special handle the `pthread_self()` function declaration and remove the `readnone` attribute.
2. In Alias Analysis we make it such that `pthread_self()` only interferes with Coroutine suspension intrinsics.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D89711/new/

https://reviews.llvm.org/D89711