[PATCH] D124984: [LoopCacheAnalysis] Use smaller numbers when calculating the costs (PR55233)

Mon May 16 10:02:45 PDT 2022

bmahjour added a comment.

Although the problem reported in PR55233 is worsened by D123400 <https://reviews.llvm.org/D123400>, the underlying issue existed even before that (as illustrated by the example in PR55233), so this patch won't completely solve the overflow problem, although it would make it a bit less likely to occur. I think a proper solution to PR55233 would need to address the overflow problem itself (eg by increasing the range of values that the cost model can represent).

I had thought about this approach when working on D123400 <https://reviews.llvm.org/D123400>, and while I agree it will make the cost values smaller, I have some reservations about its accuracy.

One observation is that with this patch, the cost values for non-consecutive accesses in deep nests become smaller, but also their relative differences reduce as well. For example in `single-store.ll`, the cost difference between for.k and for.j is in the orders of magnitude, while the cost difference between for.j and for.i is much smaller. I worry that this dilution of cost difference might make it harder to find the right order in loop nests that contain multiple reference groups. I think this approach slightly diverges from the concepts presented in the paper, inasmuch as, the unit of measurement for the cost value is meant to be number of cache lines, but with the multiplication by the "depth" factor the value will no longer be estimating number of cache lines.

================
Comment at: llvm/lib/Analysis/LoopCacheAnalysis.cpp:323
+    // equal to the iterations of the i-loop multiplied by 4. Similarly if we
+    // assume the i-loop is in the innermost position, the cost would be equal
+    // to the iterations of the i-loop multiplied by 3.
----------------
i-loop -> j-loop?

================
Comment at: llvm/lib/Analysis/LoopCacheAnalysis.cpp:324
+    // assume the i-loop is in the innermost position, the cost would be equal
+    // to the iterations of the i-loop multiplied by 3.
     RefCost = TripCount;
----------------
i-loop -> j-loop?

================
Comment at: llvm/lib/Analysis/LoopCacheAnalysis.cpp:332
+    const SCEV *DepthFactor =
+        SE.getConstant(WiderType, getNumSubscripts() - Index - 1);
+    RefCost = SE.getMulExpr(SE.getNoopOrAnyExtend(RefCost, WiderType),
----------------
This will always give a factor of 0 for the inner-most subscript. Is that intentional?

================
Comment at: llvm/test/Analysis/LoopCacheAnalysis/PowerPC/loads-store.ll:13

-; CHECK: Loop 'for.i' has cost = 3000000
-; CHECK: Loop 'for.k' has cost = 2030000
-; CHECK: Loop 'for.j' has cost = 1060000
+; CHECK: Loop 'for.i' has cost = 6000000
+; CHECK-NEXT: Loop 'for.k' has cost = 2030000
----------------
why does this change here, but didn't change in D123400?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D124984/new/

https://reviews.llvm.org/D124984