[PATCH] D50985: [SCEV] LoopsUsed memoization

Fri Aug 24 09:52:55 PDT 2018

rtereshin added a comment.

Hi Maxim,

Thank you for looking into this!

Given my current understanding I believe that as soon as (1) `LoopsRefd` is a proper inverse (or undefined) map of `LoopUsers` we should be fine, unless (2) some passes mis-use Scalar Evolution's APIs, for instance, do not call `forgetLoop` or `forgetTopmostLoop` on changed or deleted Loops.

For (1), I see `LoopUsers` getting changed only in a few places:
a) https://github.com/llvm-mirror/llvm/blob/5c1bd30b863cf271dbe70219b0bb5717d1e2ec7e/lib/Analysis/ScalarEvolution.cpp#L11813
b) https://github.com/llvm-mirror/llvm/blob/5c1bd30b863cf271dbe70219b0bb5717d1e2ec7e/lib/Analysis/ScalarEvolution.cpp#L6778

I believe in both cases the changes are reflected in `LoopsRefd`.

As for (2) if that's the case we already have a problem regardless applying this patch or not, which is also not going to be surfaced by testing Scalar Evolution's APIs, assuming such tests use the APIs as intended, which is probably a safe assumption.

I could also add that the CTMark compiles generated exactly the same binaries before and after this patch.

================
Comment at: lib/Analysis/ScalarEvolution.cpp:11763
   MinTrailingZerosCache.erase(S);
+  LoopsRefd.erase(S);

----------------
mkazantsev wrote:
> I don't think it is sufficient for correctness. Imagine the situation: `A = {1,+,1}<some_loop>, B = A + 1`. Something has changed for A, and it is no longer using `some_loop`. How do we say that `B` is not using it as well?
> 
> I also think that something needs to be done about it in `forgetLoop`. This method is called, for example, when a loop becomes non-existent, or when the set of its blocks changes. And you may end up with many cache bundles keeping references to this loop.
> I don't think it is sufficient for correctness. Imagine the situation: A = {1,+,1}<some_loop>, B = A + 1. Something has changed for A, and it is no longer using some_loop. How do we say that B is not using it as well?

The loop is part of the SCEV node's identity: https://github.com/llvm-mirror/llvm/blob/5c1bd30b863cf271dbe70219b0bb5717d1e2ec7e/lib/Analysis/ScalarEvolution.cpp#L3423

Therefore A can not just stop using some_loop, it could only be recreated with a different loop attached and it will be a different (by reference) node A'.
Same for B, it will be a different node B' = A' + 1 with no cached results for either B' or A'. At least, this is my current understanding of what's going on here.

> I also think that something needs to be done about it in forgetLoop. This method is called, for example, when a loop becomes non-existent, or when the set of its blocks changes.

I believe it's already done: https://github.com/llvm-mirror/llvm/blob/5c1bd30b863cf271dbe70219b0bb5717d1e2ec7e/lib/Analysis/ScalarEvolution.cpp#L6774-L6779

> And you may end up with many cache bundles keeping references to this loop.

I think this is only possible if a pass using Scalar Evolution and changing loops violates the API and doesn't call `forgetLoop` on an invalidated Loop. If so, it's that pass' issue, not Scalar Evolution's, and heavy-testing Scalar Evolution's APIs won't surface such bugs at all.

If `forgetLoop` is properly called however, it would clean the cache for every SCEV referencing the Loop (as soon as `LoopsRefd` is a proper inverse map of `LoopUsers`), so no stale bundles I believe.
Also, given that every SCEV B referencing a SCEV A references all the loops referenced by A (and potentially more), it will end up clearing this cache for every SCEV node (transitively) dependent on a directly affected SCEV.
It's probably worth noting also that `forgetLoop` "forgets" all the contained loops as well, not just the one explicitly specified.

Repository:
  rL LLVM

https://reviews.llvm.org/D50985