[PATCH] D89711: [Coroutine] Prevent value reusing across coroutine suspensions in EarlyCSE and GVN

Mon Oct 19 16:49:09 PDT 2020

lxfind added a comment.

In D89711#2339528 <https://reviews.llvm.org/D89711#2339528>, @efriedma wrote:

>> but that would also mean we would never be able to optimize out redundant pthread_self() calls
>
> We can probably mess with alias analysis so it understands that pthread_self doesn't alias operations other than calls to a coroutine suspend; that should be enough to recover the relevant optimizations.  Not sure if we want to add some sort of IR attribute, or just special-case that specific library call using TargetLibraryInfo.
>
>> And yes thread local variables are another set of problems. I don't have a solution yet on how to handle them.
>
> We probably need an intrinsic that computes the runtime address of a thread-local variable, so we compute the address at some specific point in the function.

After thinking about it more:
First of all, we cannot drop the readnone tag in the definition of pthread_self in Clang, the regression in the non-coroutine cases are likely unacceptable and they should pay for it if not using coroutines.
Secondly, because one can call pthread_self through indirect function calls, hence just checking for pthread_self in coroutines is not sufficient. Instead, in a coroutine function, we never want to reuse the results of function calls.
There doesn't seem to be a way to tag a callsite that it might access memory (except through operand bundles, which doesn't seem to fit here), so it seems to me there are only two possible solutions:

1. Rewrite Clang frontend for coroutine so that it directly emit multiple functions for each suspension region. It eliminates the problem but then optimizing across multiple functions that in fact belong to one will be quite challenging and the change will be very significant.
2. In all the relevant passes that would reuse call results (EarlyCSE and GVN as far as I know, but do let me know if there are others), do not reuse call results within a coroutine.

Since the first solution is way to heavy and has a lot of downsides, the second solution seems the way to go. It will be basically along the shape of this patch, but limit the damage to only call result sharing, not other expressions. What do you think?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D89711/new/

https://reviews.llvm.org/D89711