[PATCH] D21124: Cache aware Loop Cost Analysis

Thu Jun 30 19:15:35 PDT 2016

tvvikram added a comment.

In http://reviews.llvm.org/D21124#470684, @jfb wrote:

> Quick review, nothing technical.

Thanks! Will address your comments, but few inline comments.

================
Comment at: include/llvm/Analysis/LoopCostAnalysis.h:82
@@ +81,3 @@
+    // Set default values for generic data layout.
+    LineSize = 4;   // Statically setting for now.
+  }
----------------
jfb wrote:
> Could you address this TODO? That's the part I'm most interested in :)
> That could be a separate patch if you want to keep things minimal.
I will work on this and submit a separate patch.

================
Comment at: include/llvm/Analysis/LoopCostAnalysis.h:91
@@ +90,3 @@
+// loop nest. 
+typedef SmallVector<Loop *, 2> LoopNest_t;
+
----------------
jfb wrote:
> LLVM doesn't usually name things with "_t".
"LoopNestType" is better?

================
Comment at: include/llvm/Analysis/LoopCostAnalysis.h:100
@@ +99,3 @@
+    ROWMAJOR
+  } AccessOrder;
+
----------------
jfb wrote:
> Column major is never used in the code. Is it going to be added later?
It is used in LoopCostAnalysis.cpp:334. Can GEP represent column major accesses? If ColumnMajor producers like Fortran frontend convert column major accesses to row major accesses, then the entire enum Order will not be necessary.

================
Comment at: include/llvm/Analysis/LoopCostAnalysis.h:116
@@ +115,3 @@
+  /// The loops and their calculated loop costs.
+  std::map <Loop *, double> LoopCosts;
+
----------------
jfb wrote:
> jfb wrote:
> > Weird format here. Could you run new code through `clang-format`?
> Wouldn't `float` be sufficient for this purpose?
For a loop with trip count 10^6 and a deep nesting of say 5 level nest, the algorithm computes a penalty of (10^6)^5 = 10^30 for the outermost level loop. Since it can easily reach FLOAT_MAX, I had kept it double.

================
Comment at: lib/Analysis/LoopCostAnalysis.cpp:22
@@ +21,3 @@
+
+/* - Build/test perfect loop nests - should probably be moved to Loop Utils - */
+// TODO: Rotated loops forming perfect nests cannot be relied on and hence not
----------------
jfb wrote:
> ?
LoopCost computation works on loop nests. Some of the code below populates perfect nests from LoopInfo. FWIW, I think it can be moved to LoopUtils, so that other optimizations can reuse it.

================
Comment at: test/Analysis/CostModel/X86/matmul-perfect-loopCost.ll:23
@@ +22,3 @@
+; CHECK-NEXT: Loop: for.cond1 Costs: 6.256252e+10
+; CHECK-NEXT: Loop: for.cond  Costs: 2.501750e+11
+; CHECK:      Loop: for.cond  TripCount: 5001
----------------
jfb wrote:
> Is your algorithm stable enough that every platform will produce exactly the same `double` result? You may want to print out the double differently so it's easier to ignore bits of lower significance.
LSBs can vary, I guess. Will shorten to 4 decimal places.

http://reviews.llvm.org/D21124