[PATCH] D124926: [LoopInterchange] New cost model for loop interchange

Thu May 19 07:45:17 PDT 2022

bmahjour added a comment.

Other than the inline comments looks good to me.

================
Comment at: llvm/lib/Transforms/Scalar/LoopInterchange.cpp:63

+static cl::opt<bool> EnableLegacyCostModel(
+    "loop-interchange-enable-legacy-cost-model", cl::init(true), cl::Hidden,
----------------
Not sure if we need an option, I slightly prefer removing it, because the fact that we have an "EnableLegacy" option for it that defaults to "true" might incorrectly imply that we still use the old cost model where in fact we use the new one for majority of cases.

================
Comment at: llvm/lib/Transforms/Scalar/LoopInterchange.cpp:512
+    // and populate the <Loop, index> pair into a map for constant time query
+    // later.
+    const auto &LoopCosts = CC->getLoopCosts();
----------------
Add a comment about what the index reprsents (ie the optimal order of loops in the nest).

================
Comment at: llvm/lib/Transforms/Scalar/LoopInterchange.cpp:1180
+  int Cost = 0;
+  if (EnableLegacyCostModel) {
+    Cost = getInstrOrderCost();
----------------
I think we should try to limit the use of legacy cost model only to cases were LoopCacheCost fails due to delinearization, as you mentioned in the description. Can we do it like this:

```
if (CostMap.find(InnerLoop) != CostMap.end() &&
      CostMap.find(OuterLoop) != CostMap.end()) {
...
} else {
   Cost = getInstrOrderCost();
    LLVM_DEBUG(dbgs() << "Cost = " << Cost << "\n");
    if (Cost < -LoopInterchangeCostThreshold)
      return true;
}
```

================
Comment at: llvm/lib/Transforms/Scalar/LoopInterchange.cpp:1192

+  // Both cost models must have decided not profitable to interchange. Output
+  // remarks for the legacy cost model if it is enabled.
----------------
Can we unify the optimization remark output for both cost models? I don't think the specific cost values are of interest to the user, so just saying whether we found the loops profitable or not should suffice. For developers  interested in the actual cost, they can get it through -debug output.

================
Comment at: llvm/test/Transforms/LoopInterchange/call-instructions.ll:2
 ; REQUIRES: asserts
-; RUN: opt < %s -basic-aa -loop-interchange -pass-remarks-missed='loop-interchange' -pass-remarks-output=%t -S \
+; RUN: opt < %s -mcpu=tsv110 -basic-aa -loop-interchange -pass-remarks-missed='loop-interchange' -pass-remarks-output=%t -S \
 ; RUN:     -verify-dom-info -verify-loop-info -stats 2>&1 | FileCheck -check-prefix=STATS %s
----------------
If we use `powerpc64le-unknown-linux-gnu` as the target triple, then we wouldn't need to specify `-mcpu`, since `getCacheLineSize()` returns valid cache line size for all Power subtargets.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D124926/new/

https://reviews.llvm.org/D124926