[PATCH] D27198: [PM] Introduce the facilities for registering cross-IR-unit dependencies that require deferred invalidation.

Sun Dec 11 13:38:45 PST 2016

silvas added a comment.

A couple suggestions to make this patch more understadable.

================
Comment at: include/llvm/IR/PassManager.h:75
 /// the IR is not mutated at all.
 class PreservedAnalyses {
 public:
----------------
The comment on this class needs to be beefed up in response to this patch. The original idea behind PreservedAnalyses was that it was a whitelist of analyses to preserve, so that invalidation was conservatively correct. The addition of `NotPreservedAnalysisIDs` breaks with this approach and adds a layer of complexity that isn't adequately addressed by the current comment.

================
Comment at: include/llvm/IR/PassManager.h:109
+  ///
+  /// Note that you can only abandon a specific analysis, not a *set* of
+  /// analyses.
----------------
This restriction seems like it stems from an implementation detail. Some implementation-level comment should explain its origin.

================
Comment at: include/llvm/IR/PassManager.h:244
+  /// analysis not preserved. It always wins over the above set and should not
+  /// include any synthetic set IDs such as the "all" ID.
+  SmallPtrSet<AnalysisKey *, 2> NotPreservedAnalysisIDs;
----------------
This restriction about no sets in `NotPreservedAnalysisIDs` needs to be explained better in the comments here. More generally, there is a clear asymmetry between the `PreservedAnalysisIDs` and `NotPreservedAnalysisIDs` that needs to be explained (not just the "rules", but *why* the rules are there). Maybe that is appropriate for the class comment?

================
Comment at: lib/Analysis/CGSCCPassManager.cpp:103
   // entire SCC layer as well rather than trying to do invaliadtion ourselves.
-  if (!PA.preserved<CGSCCAnalysisManagerModuleProxy>() ||
+  if (!PA.preserved<CGSCCAnalysisManagerModuleProxy, AllAnalysesOn<Module>>() ||
       Inv.invalidate<LazyCallGraphAnalysis>(M, PA) ||
----------------
jlebar wrote:
> chandlerc wrote:
> > jlebar wrote:
> > > Hm, now that I see it being used, I am even less thrilled about this API.  It's not at all obvious what the second template argument means here, and it also seems super easy to forget to pass this the relevant arguments.  In addition, if I ever add a new abstract set, I have to go and modify every preserved() call.
> > > 
> > > Would it be out of the question to encapsulate within (say) the CGSCCAnalysisManagerModuleProxy type the sets that cover it, so that we could continue to pass only one type to preserved<...>()?
> > I mean, I don't disagree with any of this, but I've not come up with a better alternative really.
> > 
> > I know there are going to be more sets than IR-unit derived ones such as CFG-preserving. =/ So bundling it inside the proxy doesn't seem like it'd be a great alternative... And it would still be quite hard to make work.
> > 
> > The key thing that needs to happen is that one layer needs to be able to introduce a preserved set for an IR unit, and then some other part of the code needs to subtract one analysis from that set, and then when we call 'invalidate' on *that* analysis it needs to not pay attention to the set.
> > 
> > Anyways, any better API ideas here are very, very welcome. =/
> > Anyways, any better API ideas here are very, very welcome. =/
> 
> One idea was up earlier in the review:
> 
> > Could we require that these abstract sets inherit from some type [or otherwise have some way to tell the difference between an analysis and a set of analyses]?
> 
> At least then PA could catch some incorrect uses of its API.
> 
> (I understand that is solving a different problem than the one I was originally commenting on here.)
> 
> In terms of this problem, are you saying that we can neither
> 
> a) have an abstract analysis set enumerate its passes, nor can we
> b) have an analysis enumerate its abstract analysis sets,
> 
> because the only layer that knows about all of the relevant sets and passes does not declare the sets or the passes?  If so this seems remarkably fragile, to the point that I would want to step back and consider whether the layering we've imposed is actually helpful -- that is, whether the design space is overconstrained.
> 
> TBH I am pretty concerned that nobody other than you and Sean is going to be smart enough to program this correctly.  For example, writing
> 
>    !PA.preserved<CGSCCAnalysisManagerModuleProxy, AllAnalysesOn<Module>>()
> 
> requires global knowledge of LLVM that the only analysis set that covers CGSCCAnalysisManagerModuleProxy is AllAnalysesOn<Module>.  (Or it somehow requires even more arcane knowledge that AllAnalysesOn<Module> is the only set that we need to enumerate here.)  If you get it wrong, things will mostly work, until they don't, so these are not going to be easy bugs to find.
> 
> My understanding is that a lot of the design complexity is motivated by a desire to allow outside-tree users to provide new kinds of PMs.  That's a laudable goal, but on average I would expect out-of-tree users to have less knowledge of LLVM core than your average core developer, so such an API is really only useful if it's hard to screw up.  If we can't make it hard for them to do the wrong thing, and if providing this loose-coupling mechanism adds substantial complexity to our internal design, I am personally not convinced we are making the right design tradeoffs.
> In terms of this problem, are you saying that we can neither
> 
> a) have an abstract analysis set enumerate its passes, nor can we
> b) have an analysis enumerate its abstract analysis sets,

b) is possible. It's just somewhat inconvenient right now because that information is hidden in the `invalidate` method which is on the analysis result object instead of the analysis itself.

> TBH I am pretty concerned that nobody other than you and Sean is going to be smart enough to program this correctly. 

For the record, I'm not convinced that I would be able to program this correctly. I don't like the approach that Chandler is taking here for precisely this reason. Explicitly tracking dependencies between analysis results so that there is a clear single point of truth in the analysis manager for the primitive operation "I need to invalidate analysis result X, invalidate all analysis results that depend on it" makes all of this so much easier.

If you haven't read it yet and want to understand the problem of analysis result invalidation better, I highly recommend reading (or at least skimming) the thread "[PM] I think that the new PM needs to learn about inter-analysis dependencies...": https://groups.google.com/d/topic/llvm-dev/4m_Lv3Rfylg/discussion
Especially this post: https://groups.google.com/d/msg/llvm-dev/4m_Lv3Rfylg/ss-UZ0wQDQAJ

https://reviews.llvm.org/D27198