[PATCH] D144226: [Loop-Interchange] Allow inner-loop only reductions

Wed May 17 12:48:48 PDT 2023

Meinersbur added inline comments.

================
Comment at: llvm/lib/Transforms/Scalar/LoopInterchange.cpp:279-284
+  struct PHIUseChain {
+    LoadInst *Load;
+    PHINode *LoopHeaderPHI;
+    PHINode *LoopExitPHI;
+    StoreInst *Store;
+  };
----------------
Can you document this data structure and its fields?

================
Comment at: llvm/lib/Transforms/Scalar/LoopInterchange.cpp:362
+  // Set of Removable PHI chains
+  SmallVector<PHIUseChain, 8> RemovablePHIs;
 };
----------------
I think "RemovablePHI" is the wrong name. The PHIs are not just removed, but demoted. Suggestion: "SinkablePHIs" ("sink" as in "sink into loop body").

================
Comment at: llvm/lib/Transforms/Scalar/LoopInterchange.cpp:536
     if (CC != nullptr) {
       const auto &LoopCosts = CC->getLoopCosts();
       for (unsigned i = 0; i < LoopCosts.size(); i++) {
----------------
In the loop optimization group call we discussed whether the CacheCost analysis will have the wrong result because it only sees the promoted reduction register, not the additional load/stores in the innermost loop.

After thinking about it, CacheCost is defined in number of touched cache lines: This does not change for the outer loop, and the inner loop only accesses the same cache line repeatedly, so I don't think it matters.

================
Comment at: llvm/lib/Transforms/Scalar/LoopInterchange.cpp:730
   }
-
   return nullptr;
----------------
[nit] unrelated change

================
Comment at: llvm/lib/Transforms/Scalar/LoopInterchange.cpp:846
+    for (auto I = Begin; I != End; I++)
+      if ((isa<LoadInst>(I) || isa<StoreInst>(I)) &&
+          DI->depends(SI, &*I, false))
----------------
There are other instructions that can access memory, e.g. CallInst (as in `memset(...)`). `tightlyNested` might already check for such instructions; if so, please add a comment about this here.

================
Comment at: llvm/lib/Transforms/Scalar/LoopInterchange.cpp:847
+      if ((isa<LoadInst>(I) || isa<StoreInst>(I)) &&
+          DI->depends(SI, &*I, false))
+        return true;
----------------
DI might not be necessary there. Since the idea is to undo what LICM does, you can use the same-strength analysis as LICM. LICM uses AliasSetTracker and MemorySSA, i.e. just asking AliasAnalysis/MemorySSA whether the pointers cannot alias (which DI does internally as well) should be sufficient.

================
Comment at: llvm/lib/Transforms/Scalar/LoopInterchange.cpp:855
+  // 2. Store and instructions from L's ExitBlock to Store.
+  if (conflicts(++Load->getIterator(),
+                L->getLoopPreheader()->getTerminator()->getIterator(), Store) ||
----------------
`++` modifies the iterator. Might have consequences if `getIterator()` returns a reference.

================
Comment at: llvm/lib/Transforms/Scalar/LoopInterchange.cpp:884
+    // Load should only be used by the PHI node.
+    assert(Load->isSimple() && "Expecting a simple load!");
+    if (!Load->hasOneUser())
----------------
If I am not mistaken, there is no previous check whether all accesses are simple already. That is, it will crash ind debug builds if someone uses e.g. `std::atomic`. I think it should also `return False` in this case. 

================
Comment at: llvm/lib/Transforms/Scalar/LoopInterchange.cpp:889-891
+      if (PromotedVarUser == PHI) {
+        continue;
+      }
----------------
[style]

================
Comment at: llvm/lib/Transforms/Scalar/LoopInterchange.cpp:909
+        assert(Store->isSimple() && "Expecting a simple store!");
+        // Value *Addr = Store->getPointerOperand();
+        if (Load->getPointerOperand() == Store->getPointerOperand()) {
----------------
Leftover code

================
Comment at: llvm/lib/Transforms/Scalar/LoopInterchange.cpp:945
+    // Insert a copy of Load in the Loop Header
+    Instruction *LI = Load->clone();
+    LI->setName(Load->getName() + ".demoted");
----------------
It seems unnecessary to clone the instruction and create a new one? Why not using the existing instruction and move it into the loop?

================
Comment at: llvm/lib/Transforms/Scalar/LoopInterchange.cpp:954
+    SI->setOperand(0, LoopExitPHI->getIncomingValue(0));
+    // SI->insertAfter(dyn_cast<Instruction>(LoopExitPHI->getIncomingValue(0)));
+    SI->insertBefore(&*InnerLoop->getLoopLatch()->getTerminator());
----------------
[nit] Leftover code

================
Comment at: llvm/lib/Transforms/Scalar/LoopInterchange.cpp:957-959
+    Store->eraseFromParent();
+    LoopHeaderPHI->eraseFromParent();
+    Load->eraseFromParent();
----------------
Shouldn't the `LoopExitPHI` be removed as well?

================
Comment at: llvm/lib/Transforms/Scalar/LoopInterchange.cpp:1067
+          Instruction *UI = dyn_cast<Instruction>(U);
+          return !isRemovableInst(UI) && UI->getParent() &&
+                 (!PN || (!OuterInnerReductions.count(PN) &&
----------------
Unless you removed it yourself, shouldn't all instructions have a parent?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D144226/new/

https://reviews.llvm.org/D144226