[PATCH] D63459: Loop Cache Analysis

Tue Jul 9 11:24:03 PDT 2019

Meinersbur added inline comments.

================
Comment at: llvm/lib/Analysis/LoopCacheAnalysis.cpp:590-591
+  if (getInnerMostLoop(Loops) == nullptr) {
+    LLVM_DEBUG(dbgs() << "Cannot compute cache cost of loop nest with more "
+                         "than one innermost loop\n");
+    return nullptr;
----------------
fhahn wrote:
> etiotto wrote:
> > Meinersbur wrote:
> > > etiotto wrote:
> > > > fhahn wrote:
> > > > > etiotto wrote:
> > > > > > Meinersbur wrote:
> > > > > > > If the intention is to provide analysis for innermost loops only, why this limitation? Could it just return an analysis result for each innermost loop?
> > > > > > > 
> > > > > > > If the analysis requires a global view to determine the cost for each loop, wouldn't a FunctionPass be more appropriate? Currently, it seems users first need get the LoopCacheAnalysis for a topmost loops, the ask it for one of its nested loops.
> > > > > > > 
> > > > > > > Are such loop nests not analyzable at all?
> > > > > > > ```
> > > > > > >   while (repeat) {
> > > > > > >     for (int i = ...)
> > > > > > >       for (int j = ...)
> > > > > > >         B[i][j] = ... A[i+1][j+1] ... // stencil
> > > > > > >     for (int i = ...)
> > > > > > >       for (int j = ...)
> > > > > > >         A[i][j] = ... B[i+1][j+1] ... // stencil
> > > > > > >   }
> > > > > > > ```
> > > > > > > 
> > > > > > > 
> > > > > > > 
> > > > > > The current scope of this PR is to analyze loop nests that have a single innermost loop.  The analysis returns a vector of loop costs for each loop in a loop nest. This is not a hard requirement, however I would like to extend the scope of the transformation in a future PR. One of the scenario I had in mind, at least initially, as a consumer of this analysis is loop interchange which operates on perfect nests and therefore the current implementation of the analysis is sufficient for that use.
> > > > > > 
> > > > > > If we were to make the analysis a function pass that would force consumers to become function passes (or use the cached version of this analysis). That seems overly restrictive especially considering that loop interchange is currently a loop pass.
> > > > > > 
> > > > > I'll try to have a closer look in a few days, but I think for the use in LoopInterchange, it would not need to be an analysis pass (I suppose there won't be a big benefit to preserve this as an analysis?). 
> > > > > 
> > > > > I think a lightweight interface to query the cost for certain valid permutations would be sufficient. I think it would be great if we only compute the information when directly required (i.e. we only need to compute the costs for loops we can interchange, for the permutations valid to interchange)
> > > > Looking forward to you comments. You are correct, we could teach loop interchange about the locality of accesses in a loop nest. There are several other loop transformations that can benefit from it. The paper that introduces the analysis (Compiler Optimizations for Improving Data Locality - http://www.cs.utexas.edu/users/mckinley/papers/asplos-1994.pdf) illustrates how the analysis was used to guide several other loop transformation, namely loop reversal, loop fusion, and loop distribution. Given the applicability of the analysis to several transformation I think it would make sense to centralize the code as an analysis pass.  
> > > > 
> > > > 
> > > > If we were to make the analysis a function pass that would force consumers to become function passes (or use the cached version of this analysis). That seems overly restrictive especially considering that loop interchange is currently a loop pass.
> > > 
> > > I don't think the new pass manager has this restriction. Passes can ask for analyses of any level using OuterAnalysisManagerProxy. The new pass manager caches everything.
> > Yes it is also my understanding (from the comments in PassManager.h) that the new pass manager does allows an "inner" pass (e.g. a Loop Pass) to ask for an "outer" analysis (e.g. a Function Analysis). However, at least from the comments in PassManager.h, the inner pass cannot cause the outer analysis to run and can only rely on the cached version which may give back a nullptr. 
> >   
> > . Given the applicability of the analysis to several transformation I think it would make sense to centralize the code as an analysis pass.
> 
> I agree it is a useful utility to have, I am just wondering what the benefit of exposing it as an analysis pass would be, as it is unlikely that the result would be used for most loops or could be cached between interested transforms frequently.
> 
> IMO it would be fine as a utility class/function, i.e. just provide a `getLoopCacheCost()` function that takes a root loop and maybe a potential re-ordering of loops, which the interested transforms can use only on the loops that they can transform. I think that would reduce the size of the patch a bit and focus on the important bits.
I convinced myself that indeed a FunctionPass might not be that nice, because any LoopPass would need to preserve it. As @fhahn points out, wanting to preserve/reuse the analysis might be rare.

Repository:
  rL LLVM

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D63459/new/

https://reviews.llvm.org/D63459