[PATCH] D63459: Loop Cache Analysis

Mon Jul 8 17:08:14 PDT 2019

Meinersbur added a comment.

@fhahn Do you still want to look over this patch?

================
Comment at: llvm/include/llvm/Analysis/LoopCacheAnalysis.h:54
+  const SCEV *getBasePointer() const { return BasePointer; }
+  unsigned getNumSubscripts() const { return Subscripts.size(); }
+  const SCEV *getSubscript(unsigned SubNum) const {
----------------
Should return `size_t`

================
Comment at: llvm/include/llvm/Analysis/LoopCacheAnalysis.h:158
+using ReferenceGroupTy = SmallVector<std::unique_ptr<IndexedReference>, 8>;
+using ReferenceGroupsTy = SmallVector<std::unique_ptr<ReferenceGroupTy>, 8>;
+
----------------
It does not make sense to have owning pointers to SmallVector's. This just adds another level of indirection compared to containing `std::vector` (or `SmallVector` to have a single allocation for all elements if none exceeds the small size) directly.

================
Comment at: llvm/lib/Analysis/LoopCacheAnalysis.cpp:590-591
+  if (getInnerMostLoop(Loops) == nullptr) {
+    LLVM_DEBUG(dbgs() << "Cannot compute cache cost of loop nest with more "
+                         "than one innermost loop\n");
+    return nullptr;
----------------
etiotto wrote:
> fhahn wrote:
> > etiotto wrote:
> > > Meinersbur wrote:
> > > > If the intention is to provide analysis for innermost loops only, why this limitation? Could it just return an analysis result for each innermost loop?
> > > > 
> > > > If the analysis requires a global view to determine the cost for each loop, wouldn't a FunctionPass be more appropriate? Currently, it seems users first need get the LoopCacheAnalysis for a topmost loops, the ask it for one of its nested loops.
> > > > 
> > > > Are such loop nests not analyzable at all?
> > > > ```
> > > >   while (repeat) {
> > > >     for (int i = ...)
> > > >       for (int j = ...)
> > > >         B[i][j] = ... A[i+1][j+1] ... // stencil
> > > >     for (int i = ...)
> > > >       for (int j = ...)
> > > >         A[i][j] = ... B[i+1][j+1] ... // stencil
> > > >   }
> > > > ```
> > > > 
> > > > 
> > > > 
> > > The current scope of this PR is to analyze loop nests that have a single innermost loop.  The analysis returns a vector of loop costs for each loop in a loop nest. This is not a hard requirement, however I would like to extend the scope of the transformation in a future PR. One of the scenario I had in mind, at least initially, as a consumer of this analysis is loop interchange which operates on perfect nests and therefore the current implementation of the analysis is sufficient for that use.
> > > 
> > > If we were to make the analysis a function pass that would force consumers to become function passes (or use the cached version of this analysis). That seems overly restrictive especially considering that loop interchange is currently a loop pass.
> > > 
> > I'll try to have a closer look in a few days, but I think for the use in LoopInterchange, it would not need to be an analysis pass (I suppose there won't be a big benefit to preserve this as an analysis?). 
> > 
> > I think a lightweight interface to query the cost for certain valid permutations would be sufficient. I think it would be great if we only compute the information when directly required (i.e. we only need to compute the costs for loops we can interchange, for the permutations valid to interchange)
> Looking forward to you comments. You are correct, we could teach loop interchange about the locality of accesses in a loop nest. There are several other loop transformations that can benefit from it. The paper that introduces the analysis (Compiler Optimizations for Improving Data Locality - http://www.cs.utexas.edu/users/mckinley/papers/asplos-1994.pdf) illustrates how the analysis was used to guide several other loop transformation, namely loop reversal, loop fusion, and loop distribution. Given the applicability of the analysis to several transformation I think it would make sense to centralize the code as an analysis pass.  
> 
> 
> If we were to make the analysis a function pass that would force consumers to become function passes (or use the cached version of this analysis). That seems overly restrictive especially considering that loop interchange is currently a loop pass.

I don't think the new pass manager has this restriction. Passes can ask for analyses of any level using OuterAnalysisManagerProxy. The new pass manager caches everything.

================
Comment at: llvm/lib/Analysis/LoopCacheAnalysis.cpp:51
+
+  Loop *CurrentLoop = const_cast<Loop *>(&Root);
+  Loops.push_back(CurrentLoop);
----------------
Before `const_cast` why not removing the `const` from the parameter list?

================
Comment at: llvm/lib/Analysis/LoopCacheAnalysis.cpp:55
+  // Collect the loops in the incoming nest in breadth-first order.
+  std::deque<Loop*> WL;
+  WL.push_back(CurrentLoop);
----------------
`WL` is used as a stack, a (Small)Vector could do this as well.

================
Comment at: llvm/lib/Analysis/LoopCacheAnalysis.cpp:102
+  OS << *R.BasePointer;
+  for (auto *Subscript : R.Subscripts)
+    OS << "[" << *Subscript << "]";
----------------
[[ https://llvm.org/docs/CodingStandards.html#use-auto-type-deduction-to-make-code-more-readable | Don’t “almost always” use auto ]]
```
for (const SCEV *Subscript : R.subscripts)
```

You seem to have applied the coding standard for auto everywhere except in foreach-loops.

================
Comment at: llvm/lib/Analysis/LoopCacheAnalysis.cpp:116
+    : IsValid(false), StoreOrLoadInst(StoreOrLoadInst), BasePointer(nullptr),
+      Subscripts(), Sizes(), SE(SE) {
+  assert((isa<StoreInst>(StoreOrLoadInst) || isa<LoadInst>(StoreOrLoadInst)) &&
----------------
No need for `Subscripts(), Sizes()`

Prefer default member initializer at declaration over `IsValid(false), BaserPointer(nullptr)`

================
Comment at: llvm/lib/Analysis/LoopCacheAnalysis.cpp:120
+
+  IsValid = delinearize(LI);
+  if (IsValid)
----------------
[subjective] I'd prefer a function that either returns an `nullptr`/`ErrorOr<IndexedReference>` than an `IsValid` flag for ever object.

================
Comment at: llvm/lib/Analysis/LoopCacheAnalysis.cpp:145
+  // all subscripts must be equal, except the leftmost one (the last one).
+  for (auto SubNum : seq<unsigned>(0, NumSubscripts - 1)) {
+    if (getSubscript(SubNum) != Other.getSubscript(SubNum)) {
----------------
Nice, this is where the coding standard recommends `auto`.

================
Comment at: llvm/lib/Analysis/LoopCacheAnalysis.cpp:172-175
+  if (InSameCacheLine)
+    LLVM_DEBUG(dbgs().indent(2) << "Found spacial reuse.\n");
+  else
+    LLVM_DEBUG(dbgs().indent(2) << "No spacial reuse.\n");
----------------
Some compilers may complain about empty statements in non-debug builds. Alternative:
```
LLVM_DEBUG({
 if (InSameCacheLine)
    dbgs().indent(2) << "Found spacial reuse.\n";
  else
    dbgs().indent(2) << "No spacial reuse.\n";
});
```

================
Comment at: llvm/lib/Analysis/LoopCacheAnalysis.cpp:207
+  unsigned Levels = D->getLevels();
+  for (unsigned Level = 1; Level <= Levels; ++Level) {
+    const SCEV *Distance = D->getDistance(Level);
----------------
Prefer `int` over `unsigned`

================
Comment at: llvm/lib/Analysis/LoopCacheAnalysis.cpp:498-499
+
+  const unsigned CLS = TTI.getCacheLineSize();
+  const Loop *InnerMostLoop = getInnerMostLoop(Loops);
+  assert(InnerMostLoop != nullptr && "Expecting a valid innermost loop");
----------------
The LLVM code base does not use `const` for stack variables.

================
Comment at: llvm/lib/Analysis/LoopCacheAnalysis.cpp:537
+  LLVM_DEBUG(dbgs() << "\nIDENTIFIED REFERENCE GROUPS:\n";
+             unsigned n = 1;
+             for (const auto &RG : RefGroups) {
----------------
Do you need wrapping behavior? `int` instead of `unsigned`.

Repository:
  rL LLVM

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D63459/new/

https://reviews.llvm.org/D63459