[PATCH] D156889: [mlir][cf] Add ControlFlow to SCF lifting pass

Thu Aug 3 12:01:23 PDT 2023

zero9178 marked an inline comment as done.
zero9178 added inline comments.

================
Comment at: mlir/lib/Transforms/Utils/CFGToSCF.cpp:478
+  // or predecessors are from outside the cycles.
+  for (Block *block : cycles) {
+    for (auto pred = block->pred_begin(); pred != block->pred_end(); pred++) {
----------------
Mogball wrote:
> I'm not sure how big the SCCs get in practice, but I think DFS over the SCC would have better complexity  (O(E+V) I think) than looping over every in/out edge of every block 
Could you elaborate on your idea? I am not sure I follow as the edges we are interested in are both incoming edges from outside blocks and outgoing edges to outside blocks therefore requiring that we iterate over all of them. I am not sure how you could reduce the complexity here. A DFS over the SCC would simply yield me the SCC + all its transitive successors which would still be missing the entry edges into the SCC. Complexity wise, both the DFS and the successor check here also done one hash set lookup per outgoing edge.

================
Comment at: mlir/test/Conversion/ControlFlowToSCF/test.mlir:530
+// CHECK:      %[[WHILE:.*]]:{{.*}} = scf.while
+// CHECK-SAME: %[[ARG1:.*]] = %[[IF]]#0
+// CHECK-SAME: %[[ARG2:.*]] = %[[IF]]#1
----------------
Mogball wrote:
> Yeah I was afraid of something like this when I skimmed through the algorithm. "Perfect reconstructibility" comes at a cost, but I still think this pass will be very useful when the input is "mostly nice".
There are even papers saying that irreducible control flow is incredibly rare :) I felt it was somewhat trivial to make irreducible loops work as shown in the paper however. 
There is also room for improvement in the future with smarter edge multiplexer as we currently are essentially adding all block arguments of entry blocks up despite all but the block args of the block being branched to being undef. I.e. all the block arguments stemming from entry blocks in the multiplexer block are used mutually exclusively. 
By creating a mapping of what block arguments of what type already exist on the multiplexer block you could then reduce the amount of block arguments. That combined with canonicalizations on the SCF ops and `arith.select` would make this a bit nicer, although the general structure with the switch/if inside the loop is likely to remain. 

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D156889/new/

https://reviews.llvm.org/D156889