[PATCH] D63459: Loop Cache Analysis

Tue Jun 18 14:21:04 PDT 2019

etiotto marked 16 inline comments as done.
etiotto added inline comments.

================
Comment at: llvm/include/llvm/Analysis/LoopCacheAnalysis.h:30-31
+class LPMUpdater;
+using CacheCost_t = unsigned long long;
+using LoopVector = SmallVector<Loop *, 8>;
+
----------------
Meinersbur wrote:
> [style] The current code base most often uses a `Ty` suffix for typedefs.
OK I'll use the Ty suffix.

================
Comment at: llvm/include/llvm/Analysis/LoopCacheAnalysis.h:174
+public:
+  static unsigned constexpr DefaultTripCount = 100;
+  static CacheCost_t constexpr InvalidCost = ULLONG_MAX;
----------------
Meinersbur wrote:
> [suggestion] Make these magic numbers `cl::opt` options?
Added cl::opt for the default loop trip count and the temporal reuse threshold (see the .cpp file)

================
Comment at: llvm/lib/Analysis/LoopCacheAnalysis.cpp:63-64
+
+/// Retrieve the innermost loop in the given loop nest \p Loops (we are
+/// expecting to find a single innermost loop).
+static const Loop *getInnerMostLoop(const LoopVector &Loops) {
----------------
Meinersbur wrote:
> Is passing a single innermost loop a precondition or a what this function computes? It seems the only loop that this function might return is `Loops.back()` and kind-of verifies that the loop vector is in the right order?!?
This function attempts to retrieve the innermost loop in the given loop vector. It returns a nullptr if any loops in the loop vector supplied has more than one sibling. The loop vector is expected to contain loops collected in breadth-first order. I'll improve the comment.

================
Comment at: llvm/lib/Analysis/LoopCacheAnalysis.cpp:386
+      (L.getExitingBlock())
+          ? SE.getSmallConstantTripCount(&L, L.getExitingBlock())
+          : CacheCost::DefaultTripCount;
----------------
Meinersbur wrote:
> Why not using `getBackedgeTakenCount()`?
I'll do that.

================
Comment at: llvm/lib/Analysis/LoopCacheAnalysis.cpp:590-591
+  if (getInnerMostLoop(Loops) == nullptr) {
+    LLVM_DEBUG(dbgs() << "Cannot compute cache cost of loop nest with more "
+                         "than one innermost loop\n");
+    return nullptr;
----------------
Meinersbur wrote:
> If the intention is to provide analysis for innermost loops only, why this limitation? Could it just return an analysis result for each innermost loop?
> 
> If the analysis requires a global view to determine the cost for each loop, wouldn't a FunctionPass be more appropriate? Currently, it seems users first need get the LoopCacheAnalysis for a topmost loops, the ask it for one of its nested loops.
> 
> Are such loop nests not analyzable at all?
> ```
>   while (repeat) {
>     for (int i = ...)
>       for (int j = ...)
>         B[i][j] = ... A[i+1][j+1] ... // stencil
>     for (int i = ...)
>       for (int j = ...)
>         A[i][j] = ... B[i+1][j+1] ... // stencil
>   }
> ```
> 
> 
> 
The current scope of this PR is to analyze loop nests that have a single innermost loop.  The analysis returns a vector of loop costs for each loop in a loop nest. This is not a hard requirement, however I would like to extend the scope of the transformation in a future PR. One of the scenario I had in mind, at least initially, as a consumer of this analysis is loop interchange which operates on perfect nests and therefore the current implementation of the analysis is sufficient for that use.

If we were to make the analysis a function pass that would force consumers to become function passes (or use the cached version of this analysis). That seems overly restrictive especially considering that loop interchange is currently a loop pass.

================
Comment at: llvm/test/Analysis/LoopCacheAnalysis/a.ll:1
+; RUN: opt < %s -passes='print<loop-cache-cost>' -disable-output 2>&1 | FileCheck %s
+
----------------
Meinersbur wrote:
> [serious] Can you give a more descriptive file names that `a.ll` and `b.ll` etc?
Ooops, yes I'll rename those files.

================
Comment at: llvm/test/Analysis/LoopCacheAnalysis/a.ll:4
+target datalayout = "e-m:e-i64:64-n32:64"
+target triple = "powerpc64le-unknown-linux-gnu"
+
----------------
Meinersbur wrote:
> [suggestion] Is it possible to leave the triple unspecified to the test case?
Unfortunately is not easily removed because the target triple is necessary to determine the target architecture cache line size, which is used in the analysis.

Repository:
  rL LLVM

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D63459/new/

https://reviews.llvm.org/D63459