[PATCH] D27198: [PM] Introduce the facilities for registering cross-IR-unit dependencies that require deferred invalidation.

Sun Dec 11 17:50:26 PST 2016

chandlerc added a comment.

FYI, working on one API improvement idea and on addressing the tactical comments from Sean that are spot on here, but wanted to reply to the two discussion threads...

================
Comment at: lib/Analysis/CGSCCPassManager.cpp:103
   // entire SCC layer as well rather than trying to do invaliadtion ourselves.
-  if (!PA.preserved<CGSCCAnalysisManagerModuleProxy>() ||
+  if (!PA.preserved<CGSCCAnalysisManagerModuleProxy, AllAnalysesOn<Module>>() ||
       Inv.invalidate<LazyCallGraphAnalysis>(M, PA) ||
----------------
silvas wrote:
> jlebar wrote:
> > chandlerc wrote:
> > > jlebar wrote:
> > > > Hm, now that I see it being used, I am even less thrilled about this API.  It's not at all obvious what the second template argument means here, and it also seems super easy to forget to pass this the relevant arguments.  In addition, if I ever add a new abstract set, I have to go and modify every preserved() call.
> > > > 
> > > > Would it be out of the question to encapsulate within (say) the CGSCCAnalysisManagerModuleProxy type the sets that cover it, so that we could continue to pass only one type to preserved<...>()?
> > > I mean, I don't disagree with any of this, but I've not come up with a better alternative really.
> > > 
> > > I know there are going to be more sets than IR-unit derived ones such as CFG-preserving. =/ So bundling it inside the proxy doesn't seem like it'd be a great alternative... And it would still be quite hard to make work.
> > > 
> > > The key thing that needs to happen is that one layer needs to be able to introduce a preserved set for an IR unit, and then some other part of the code needs to subtract one analysis from that set, and then when we call 'invalidate' on *that* analysis it needs to not pay attention to the set.
> > > 
> > > Anyways, any better API ideas here are very, very welcome. =/
> > > Anyways, any better API ideas here are very, very welcome. =/
> > 
> > One idea was up earlier in the review:
> > 
> > > Could we require that these abstract sets inherit from some type [or otherwise have some way to tell the difference between an analysis and a set of analyses]?
> > 
> > At least then PA could catch some incorrect uses of its API.
> > 
> > (I understand that is solving a different problem than the one I was originally commenting on here.)
> > 
> > In terms of this problem, are you saying that we can neither
> > 
> > a) have an abstract analysis set enumerate its passes, nor can we
> > b) have an analysis enumerate its abstract analysis sets,
> > 
> > because the only layer that knows about all of the relevant sets and passes does not declare the sets or the passes?  If so this seems remarkably fragile, to the point that I would want to step back and consider whether the layering we've imposed is actually helpful -- that is, whether the design space is overconstrained.
> > 
> > TBH I am pretty concerned that nobody other than you and Sean is going to be smart enough to program this correctly.  For example, writing
> > 
> >    !PA.preserved<CGSCCAnalysisManagerModuleProxy, AllAnalysesOn<Module>>()
> > 
> > requires global knowledge of LLVM that the only analysis set that covers CGSCCAnalysisManagerModuleProxy is AllAnalysesOn<Module>.  (Or it somehow requires even more arcane knowledge that AllAnalysesOn<Module> is the only set that we need to enumerate here.)  If you get it wrong, things will mostly work, until they don't, so these are not going to be easy bugs to find.
> > 
> > My understanding is that a lot of the design complexity is motivated by a desire to allow outside-tree users to provide new kinds of PMs.  That's a laudable goal, but on average I would expect out-of-tree users to have less knowledge of LLVM core than your average core developer, so such an API is really only useful if it's hard to screw up.  If we can't make it hard for them to do the wrong thing, and if providing this loose-coupling mechanism adds substantial complexity to our internal design, I am personally not convinced we are making the right design tradeoffs.
> > In terms of this problem, are you saying that we can neither
> > 
> > a) have an abstract analysis set enumerate its passes, nor can we
> > b) have an analysis enumerate its abstract analysis sets,
> 
> b) is possible. It's just somewhat inconvenient right now because that information is hidden in the `invalidate` method which is on the analysis result object instead of the analysis itself.
> 
> 
> 
> > TBH I am pretty concerned that nobody other than you and Sean is going to be smart enough to program this correctly. 
> 
> For the record, I'm not convinced that I would be able to program this correctly. I don't like the approach that Chandler is taking here for precisely this reason. Explicitly tracking dependencies between analysis results so that there is a clear single point of truth in the analysis manager for the primitive operation "I need to invalidate analysis result X, invalidate all analysis results that depend on it" makes all of this so much easier.
> 
> If you haven't read it yet and want to understand the problem of analysis result invalidation better, I highly recommend reading (or at least skimming) the thread "[PM] I think that the new PM needs to learn about inter-analysis dependencies...": https://groups.google.com/d/topic/llvm-dev/4m_Lv3Rfylg/discussion
> Especially this post: https://groups.google.com/d/msg/llvm-dev/4m_Lv3Rfylg/ss-UZ0wQDQAJ
> 
> 
> 
Justin wrote:

> In terms of this problem, are you saying that we can neither
> 
> a) have an abstract analysis set enumerate its passes, nor can we
> b) have an analysis enumerate its abstract analysis sets,

I believe that we have (b) -> the analysis enumerates these in its result's invalidate routine.

The problem is that the API for doing this isn't good. I have some ideas about improving the API after thinking more on it. One thing that I did experiment with and continue to dislike is trying to do this *declaratively*. Every version of that I've come up with has been, IMO, much harder to understand.

> TBH I am pretty concerned that nobody other than you and Sean is going to be smart enough to program this correctly.

I don't think this is about smarts. =] I think this is largely a problem of documentation and API design. I think your review is helping both of those.

I also think it is important to understand how rarely this complexity will come up. Most analyses will simply:

1) Use the default invalidation logic which works out of the box, or
2) Simply declare that they are never invalidated because they are fundamentally immutable or self-updating, or
3) Implement an invalidate routine that checks a few common sets like 'CFG' in addition to themselves.

Everything else is relatively rare. The next most common case are analyses which embed references to other analyses in their results. Some of these are because it was easy rather than because it was the right design. But some will need to use the `Invalidator` logic provided in the previous review.

Most of the facilities I'm adding in this patch to be very rarely used. That doesn't mean it gets a free pass of course, it still needs to be clearly documented and have examples that show how to use it and not be easy to misuse in subtle ways.

The facility I am most concerned about (and I called it out, and you called it out) is just letting an analysis result check an additional set or two. That is currently too confusing, agreed, and I'd like a better API for that. I'm experimenting with the idea you suggested Justin and I think it might help, but I can't yet be certain. I'll update the patch if/when I get something interesting.

I also don't think we should strive for perfection in a single patch if there aren't terribly good ideas yet.

Regarding the meta point Sean, I continue to think that unifying the analysis management is the wrong design. I think it creates serious issues when expressing analyses on IR units defined by analyses, which is functionality that I very much want in the design.

================
Comment at: unittests/IR/PassManagerTest.cpp:431

+/// A test analysis pass which chaches in its result the result from the above
+/// indirect analysis pass.
----------------
jlebar wrote:
> chandlerc wrote:
> > jlebar wrote:
> > > silvas wrote:
> > > > nit: avoid the terminology "analysis pass". In the new PM analyses and transformations are separate concepts. The term "pass" doesn't help because it conflates the two (and many uses in the code use "pass" to really mean "transformation", so "analysis pass" is particularly confusing and old-PM'ish).
> > > > 
> > > > Hopefully some day we can rename things to be more consistent. Really the new "PM" has just two main things: an `AnalysisCache` class and a bunch of composable `TransformationRunner`'s. There isn't a conflated concept of "pass" (which can be either a transformation or an analysis) like in the old PM.
> > > +1 to that in principle, although if we actually carry that out, we're going to have to do a big refactoring, so until then my personal preference would be that we should just say whatever is clear, rather than adding in the new terminology in places where "pass" would, in the current state, be more clear.
> > > 
> > > Not begging the specific question of what to say here.
> > I don't actually think of it this way.
> > 
> > I think there is a common underlying idea of a pass, and there are two primary special cases: analysis passes and transformation passes.
> > 
> > I understand that many (most?) analysis passes tend to be trivial and we instead focus on the analysis result and caching it, but I don't want to neglect the fact that it is a pass that gets run over the IR. In that sense, the `AnalysisManager` *is* a `pass manager as well.
> > 
> > Anyways, I'm happy to spend some time debating this long term, but I'm not sure it's the right focus of this code review....
> >  I'm not sure it's the right focus of this code review....
> 
> I tend to lose state on anything I'm not working on in a week.  If you start the discussion in the forum of your choosing Monday, great.  If you start a thread in two or three weeks, you will probably still have state, but I may not, and then I will either decide I no longer care, or have to come back and page all this back in.  Either way is not fun.
> 
> In fact I now vaguely recall that we had an outstanding question from a previous patch that we said we'd discuss outside the review.  Maybe we did come back and it's been resolved?  I am sort of a goldfish.
> 
> Because I've now been thinking about it, let me just say what I have in mind so we can capture it somewhere.  I hope that's OK.
> 
> > the AnalysisManager *is* a `pass manager as well.
> 
> I would ask a slightly different question.  Instead of "is AM technically a PM?", I'd ask, "is it *useful* to an engineer of average skill to think of AM as a PM, and to think of Analyses as Passes?"
> 
> One can even have comments on AM explaining this technicality if it's useful to understand when thinking about the PM/AM framework.  But then, is this happenstance of abstraction *so fundamental* that we should also use it everywhere else?  Maybe I haven't fully understood the code because I don't grok that an analysis is just a monoid in the category of endofunctors^W^W^W^W^W^Wpass.  :)
> 
> Personally I think "pass" is a useful word because "optimization pass" was a term I knew before I started working on compilers.  But like Sean I am not sure it helps me more than it hurts to think of analyses as the same sort of thing.
Totally good to capture it. =] I'm hoping to discuss this more with you on Monday though and we can kick off some ideas on the list and/or IRC. I wasn't planning on waiting weeks and weeks.

https://reviews.llvm.org/D27198