[PATCH] D96928: [LICM][Coroutine] Don't sink stores from loops with coro.suspend instructions

Mon Feb 22 10:27:11 PST 2021

lxfind added a comment.

> LICM move the memory operations out of the loop. It does reduce the number of memory operations. More importantly, I agree with @efriedma that either we have general solution or we describe these restrictions other than fix them anywhere.

Let me elaborate in more detail:
This patch does not attempt to disable the entire LICM in the presence of coroutines. Instead, it disables a specific part of LICM: promoting memory references to scalars.
It does so by sinking stores out of the loop and moving loads to before the loop. Let's look at each of the two cases:

1. Sinking stores out of the loop. LICM sinks stores out of the loop by turning the memory stores into scalar stores, and then outside of the loop it stores the scalars into the memory. So it is important to note that LICM introduces a scalar in the loop that needs to stay alive until the loop ends, so that it can store that scalar into the memory. In the presence of coroutine, that is, the loop can suspend and resume, anything that needs to live through the loop will need to be put on the coroutine frame (i.e. heap). So even though LICM can turn the memory store into a scalar store, with coroutine, that scalar needs to live on the coroutine frame and hence scalar store will eventually become a memory store again. So effectively as you can see, we still have the same number of memory stores in the loop, and further more we introduced one more entry in the frame to store the sunk scalar.
2. Moving loads to before the loop. The reasoning is similar here. In order to move loads to before the loop, we need a scalar to store the result of the load, so that we can access that scalar within the loop. However in the presence of coroutine, if the scalar value needs to live through the loop, it also needs to be put on the coroutine frame, which is the heap. Hence every read of the scalar value in the loop is still a memory load. We still end up with the same number of memory loads, and we also added one more entry to the frame.

Does this make sense?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D96928/new/

https://reviews.llvm.org/D96928