[PATCH] D61397: [DAGCombiner] Limit number of nodes to explore, to avoid quadratic compile time.

Thu May 2 08:21:00 PDT 2019

niravd added a comment.

Notes inline, but I think the majority of the compile time improvements come from keepign TokenFactor operand counts bounded. This should be changed to do reflect that.

That said, I think rewriting the "CanMergeStoresTo" to "GetMaximumMergedStoreSize" and using that as a first-order filter on nodes to consider like was discussed in D60133 <https://reviews.llvm.org/D60133> should be done before any heuristic restrictions (including TokenFactor operand count limits) should be considered as it causes no code generation degradation, only compile time improvements.

================
Comment at: llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp:1767
 SDValue DAGCombiner::visitTokenFactor(SDNode *N) {
+  // Limit the number of nodes to explore, to avoid quadratic compile times.
+  // The limit helps to limit compile time for basic blocks with a large number
----------------
I this that this is too restrictive. This check effectively curtails the second and third passes (The first being the quick two operand check). 

The third pass is a fairly expensive chain predecessor search to see if there are redundancies in the operands (if A and its chain predecessor B are both in the list, you can drop B) but already has a cutoff of 1024 steps. 

The second pass is a linear pass to weed out trivial redundancies in the list. Due to the way that chain replacements happen, it's extremely common for TokenFactors to have redundant operands in them and I worry that curtialing this will have nontrivial degradation for relatively little compile time improvement. 

If this is making a notable difference, I suspect we should  change the second pass to not inline a TokenFactor if the number of operands in the new TokenFactor would be too large.

================
Comment at: llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp:14852
     RootNode = Ldn->getChain().getNode();
-    for (auto I = RootNode->use_begin(), E = RootNode->use_end(); I != E; ++I)
+    for (auto I = RootNode->use_begin(), E = RootNode->use_end(); I != E; ++I) {
+      if (!MaxNumberOfNodesToCheck--)
----------------
nit: fold this into the loop test and increment (MaxNumberOfNodesToCheck> 0 , --MaxNumberOfNodesToCheck)

nit: Since we merge mainly powers of two, the limit should one or slightly higher (1025?)

================
Comment at: llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp:19999

+  // Limit the number of chains to explore, to avoid quadratic compile times.
+  // The limit helps to limit compile time for basic blocks with a large number
----------------
It seems like the underlying  problem you're trying to avoid is having a TokenFactor with too many operands. That's definitely an issue. We do try to avoid it in the SelectionDAGBuilder, but not consisently elsewhere.

This is indirectly doing this by forcing us to only consider a smaller set of chains. Given that this catches mostly long chains of independant chains with mostly the same real dependences, it would probably be much better to do the calculation here as is, but instead of making larger tokenfactors, split them into a hierarchy of some sort. Maybe this should be folded into the TokenFactor case of getNode smart constructor.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D61397/new/

https://reviews.llvm.org/D61397