[PATCH] D63459: Loop Cache Analysis

Tue Jul 9 08:05:24 PDT 2019

etiotto marked 17 inline comments as done.
etiotto added inline comments.

================
Comment at: llvm/lib/Analysis/LoopCacheAnalysis.cpp:590-591
+  if (getInnerMostLoop(Loops) == nullptr) {
+    LLVM_DEBUG(dbgs() << "Cannot compute cache cost of loop nest with more "
+                         "than one innermost loop\n");
+    return nullptr;
----------------
Meinersbur wrote:
> etiotto wrote:
> > fhahn wrote:
> > > etiotto wrote:
> > > > Meinersbur wrote:
> > > > > If the intention is to provide analysis for innermost loops only, why this limitation? Could it just return an analysis result for each innermost loop?
> > > > > 
> > > > > If the analysis requires a global view to determine the cost for each loop, wouldn't a FunctionPass be more appropriate? Currently, it seems users first need get the LoopCacheAnalysis for a topmost loops, the ask it for one of its nested loops.
> > > > > 
> > > > > Are such loop nests not analyzable at all?
> > > > > ```
> > > > >   while (repeat) {
> > > > >     for (int i = ...)
> > > > >       for (int j = ...)
> > > > >         B[i][j] = ... A[i+1][j+1] ... // stencil
> > > > >     for (int i = ...)
> > > > >       for (int j = ...)
> > > > >         A[i][j] = ... B[i+1][j+1] ... // stencil
> > > > >   }
> > > > > ```
> > > > > 
> > > > > 
> > > > > 
> > > > The current scope of this PR is to analyze loop nests that have a single innermost loop.  The analysis returns a vector of loop costs for each loop in a loop nest. This is not a hard requirement, however I would like to extend the scope of the transformation in a future PR. One of the scenario I had in mind, at least initially, as a consumer of this analysis is loop interchange which operates on perfect nests and therefore the current implementation of the analysis is sufficient for that use.
> > > > 
> > > > If we were to make the analysis a function pass that would force consumers to become function passes (or use the cached version of this analysis). That seems overly restrictive especially considering that loop interchange is currently a loop pass.
> > > > 
> > > I'll try to have a closer look in a few days, but I think for the use in LoopInterchange, it would not need to be an analysis pass (I suppose there won't be a big benefit to preserve this as an analysis?). 
> > > 
> > > I think a lightweight interface to query the cost for certain valid permutations would be sufficient. I think it would be great if we only compute the information when directly required (i.e. we only need to compute the costs for loops we can interchange, for the permutations valid to interchange)
> > Looking forward to you comments. You are correct, we could teach loop interchange about the locality of accesses in a loop nest. There are several other loop transformations that can benefit from it. The paper that introduces the analysis (Compiler Optimizations for Improving Data Locality - http://www.cs.utexas.edu/users/mckinley/papers/asplos-1994.pdf) illustrates how the analysis was used to guide several other loop transformation, namely loop reversal, loop fusion, and loop distribution. Given the applicability of the analysis to several transformation I think it would make sense to centralize the code as an analysis pass.  
> > 
> > 
> > If we were to make the analysis a function pass that would force consumers to become function passes (or use the cached version of this analysis). That seems overly restrictive especially considering that loop interchange is currently a loop pass.
> 
> I don't think the new pass manager has this restriction. Passes can ask for analyses of any level using OuterAnalysisManagerProxy. The new pass manager caches everything.
Yes it is also my understanding (from the comments in PassManager.h) that the new pass manager does allows an "inner" pass (e.g. a Loop Pass) to ask for an "outer" analysis (e.g. a Function Analysis). However, at least from the comments in PassManager.h, the inner pass cannot cause the outer analysis to run and can only rely on the cached version which may give back a nullptr. 

================
Comment at: llvm/lib/Analysis/LoopCacheAnalysis.cpp:51
+
+  Loop *CurrentLoop = const_cast<Loop *>(&Root);
+  Loops.push_back(CurrentLoop);
----------------
Meinersbur wrote:
> Before `const_cast` why not removing the `const` from the parameter list?
I dropped the const qualifier in the parameter declaration and adjusted the code accordingly.

================
Comment at: llvm/lib/Analysis/LoopCacheAnalysis.cpp:102
+  OS << *R.BasePointer;
+  for (auto *Subscript : R.Subscripts)
+    OS << "[" << *Subscript << "]";
----------------
Meinersbur wrote:
> [[ https://llvm.org/docs/CodingStandards.html#use-auto-type-deduction-to-make-code-more-readable | Don’t “almost always” use auto ]]
> ```
> for (const SCEV *Subscript : R.subscripts)
> ```
> 
> You seem to have applied the coding standard for auto everywhere except in foreach-loops.
OK I'll fix the use of auto in foreach-loops.

================
Comment at: llvm/lib/Analysis/LoopCacheAnalysis.cpp:145
+  // all subscripts must be equal, except the leftmost one (the last one).
+  for (auto SubNum : seq<unsigned>(0, NumSubscripts - 1)) {
+    if (getSubscript(SubNum) != Other.getSubscript(SubNum)) {
----------------
Meinersbur wrote:
> Nice, this is where the coding standard recommends `auto`.
Thanks.

================
Comment at: llvm/lib/Analysis/LoopCacheAnalysis.cpp:172-175
+  if (InSameCacheLine)
+    LLVM_DEBUG(dbgs().indent(2) << "Found spacial reuse.\n");
+  else
+    LLVM_DEBUG(dbgs().indent(2) << "No spacial reuse.\n");
----------------
Meinersbur wrote:
> Some compilers may complain about empty statements in non-debug builds. Alternative:
> ```
> LLVM_DEBUG({
>  if (InSameCacheLine)
>     dbgs().indent(2) << "Found spacial reuse.\n";
>   else
>     dbgs().indent(2) << "No spacial reuse.\n";
> });
> ```
Agree. Is cleaner the way you propose as there is only one LLVM_DEBUG.

================
Comment at: llvm/lib/Analysis/LoopCacheAnalysis.cpp:498-499
+
+  const unsigned CLS = TTI.getCacheLineSize();
+  const Loop *InnerMostLoop = getInnerMostLoop(Loops);
+  assert(InnerMostLoop != nullptr && "Expecting a valid innermost loop");
----------------
Meinersbur wrote:
> The LLVM code base does not use `const` for stack variables.
const qualifying local variables preempts unintended modifications later in the function... but I do not feel strongly about it. I'll change it.

Repository:
  rL LLVM

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D63459/new/

https://reviews.llvm.org/D63459