[PATCH] D59869: [NewPM] Fix a nasty bug with analysis invalidation in the new PM.

Wed Mar 27 14:54:24 PDT 2019

chandlerc added a comment.

In D59869#1445062 <https://reviews.llvm.org/D59869#1445062>, @philip.pfaffe wrote:

> Anyways, this patch looks good!

Thanks, will land it momentarily, but I do want to dig into your questions and Wei's questions here.

> Okay. Any plans to improve it? On the other hand, I'm not sure about the impact. We'll probably have to get some profile data first.

I don't see any really good ways of implementing this. See below, as I think Wei was also hitting on this, and I'll try to answer in a bit of detail.

But yes, if we find that this causes a serious drop in available cached results, we can investigate alternatives. My suspicion is that *this* won't be the real problem though, and so I'm not too worried.

In D59869#1445186 <https://reviews.llvm.org/D59869#1445186>, @wmi wrote:

> Thanks for working on the patch which will fix several build failures found internally! I believe the patch is already after a lot of balancing, and here are just some questions out of curiousity.
>
> In the patch, I saw two targets to achieve: T1. lazy invalidation. T2. maintain pass-manager driven invalidation scheme (mentioned in a comment). And I saw two compromises made, C1. If a SCC changes its IR and invalidate some analysis, the invalidated pass will be propagated not only to its SCC parent, but also to its grandparent and grandgrandparent ... or even SCC not being an ancestor on the call graph if only it is visited after the current SCC. C2. For CGSCC pass other than ArgumentPromotion, like inliner pass, even if it will not touch the IR other than the current SCC like what ArgumentPromotion does, it will still invalidate the analysis of other SCCs after this change.
>
> lazy invalidation seems to be achieved using the UR.CrossSCCPA set, so the purpose of the compromises seems mainly to maintain passmanager driven invalidation scheme. From my understanding, the scheme means invalidation initiated by pass manager instead of single IR Unit,  and pass manager can only invalidate the analysis of the IR managed by itself.
>
> I am wondering for C1, Can we have multiple CrossSCCPA instead of one (like a map, every SCC has its own CrossSCCPA) so if only the current SCC hasn't made any change, it can propagate preservedall to its parent SCC.

This seems really appealing, and I thought a bunch about it. I hit some really difficult problems though and don't really see a way to make it work. That said, if you see something I'm missing, please say so. It would certainly be convenient.

So, first, let's think about how to do this while still being lazy -- that is, the pass itself doesn't *eagerly* invalidate anything. Also, I'll use ArgumentPromotion as an example (with the caller being mutated while processing the callee SCC).

What we could do is, as you suggest have a map. The next question is, what key would we put in the map? It turns out this is ... a really tricky question. Let's assume that the caller is in an `SCC` (that was the case we hit in the wild certainly). In that case, we could put the caller's SCC into the map as a key. This creates a bunch of new problems unfortunately. The inliner or another pass may merge this SCC with other SCCs or split it appart. We'll have to update the map every time, and it may be somewhat complex to do the update. But it is worse than that, the function in question may *move* between SCCs. So we'll also have to be able to think about that form of update. Maybe we can solve this, but it already started to look very unappealing to me in terms of complexity and cost. Then I started thinking about the case where the caller isn't in an SCC *at all* because we haven't built the callers part of the call graph yet! So we would also need a two-level map so that we can add invalidation into SCCs that we form in the future from functions that have not yet been visited.

Ultimately, storing enough information to do the invalidation for individual IR units seems definitely do-able, but not trivial. I think we'd need to have a decent amount of infrastructure to update everything and make sure it stayed in sync as the structure changes. This is a big reason why I didn't try harder to address the issue in this direction. We can go down the road if we end up in practice *needing* this kind of preservation, but it does seem quite difficult.

A different approach would be to invalidate things eagerly. This removes any need to track things, which is nice. But it forces each pass that mutates code outside of its SCC to know all the details of invalidation. It also will perform that invalidation at a surprising step when nothing else would naturally be operating on that unit of IR. It also removes the ability to coalesce multiple invalidation events into a single one. That also seemed quite a high cost.

So this is how I ended up with the compromise here. It seemed like a basically minimal burden to get the most important preservation while still keeping things correct.

> For relieving C2, it seems to require returning PreservedAnalysesCrossSCC instead of PreservedAnalyses for each SCC pass so each pass can set PreservedAnalysesCrossSCC accordingly. That seems not very easy with little churn

This is an interestingly different compromise to tackle. I don't quite understand your suggestion (changing the return type of the pass seems like it would be fairly disruptive from an API / code perspective). But I think the idea here is very straight forward. The way I would do it is to have a synthetic "analysis" ID that can be preserved which symbolically represents preserving all other SCCs' analyses. Then, before updating the CrossSCCPA, we could check if that symbolic analysis is preserved. This would allow the inliner to mark in its pass `PA.preserve<SomeSymbolicNameHere>()` and avoid any extraneous invalidation.

I'm fine adding this if we want it, but it wasn't clear there was any realistic advantage. I don't really expect that many cached analysis results to exist for parent functions. It seemed like the case we found in the wild was much more of the edge case. But if you or any others are seeing more significant caching here, then yes, we should add some symbolic analysis so that we can preserve it.

It also has the nice effect of remaining conservatively correct -- a pass will have to explicitly think about the fact that it is not mutating other SCCs before setting this.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D59869/new/

https://reviews.llvm.org/D59869