[PATCH] D76268: [MLIR] Fix op folding to not run pre-replace when not constant folding

Tue Mar 17 02:17:33 PDT 2020

bondhugula marked an inline comment as done.
bondhugula added inline comments.

================
Comment at: mlir/lib/Transforms/Utils/FoldUtils.cpp:92

-  // Otherwise, replace all of the result values and erase the operation.
+  // Constant folding succeeded. We will start replacing this op's uses and
+  // erase this op. Invoke the callback provided by the caller to perform any
----------------
rriddle wrote:
> Hmmm, can we instead add a flag to the callback detailing if the op was just updated in-place? I'd say for canonicalization we still want to reconsider the original operands and the users of the operation even when it gets updated.
There is a tradeoff to make here. If the op gets updated, but doesn't really fold to a constant: would  you really want to add its users and operands to the worklist? Although some of the original operands might have been removed and will have a lower use count, we won't be sure if that's true and for how many (because we won't know in what way the op got updated). So you risk adding more and more wasteful worklist items for that iteration. If those operand use values end up with say 0 uses, you'd anyway catch them later - either in that iteration or the next iteration since 'changed' would be set to true. 

In striking contrast, in the case of constant folding, you definitely know that all original operands are wiped out, and so it makes perfect sense to add all of them to the worklist as well as add users of its results to the worklist, which the existing code does. But there is something to think about for the cases where there isn't true folding (aka constand folding). As a concrete example, consider forOp bound canonicalization: here, dropping unused bound operands is being done as part of tryToFold and it returns true when that happens. Now, we don't need to add all of the original operands back to the worklist - they'll be caught in the worklist subsequently and a good subset of them are probably still on the operand list after the folding, which means it's wasted work for their pattern matches.

Similarly, adding the users of a result when you don't do a true fold but just an update might make things slower: the situations where the user is affected because the defining op has been updated (as opposed to folded) are even less common(?) (compared to the operand case). So you risk creating 0 work pattern matches in that iteration. And since we set 'changed' to already true, all of these ops would anyway get into the worklist again.

Let me know what you think - I wish we had some precise timing tests on a good mix of heavy use cases. But there aren't many ops where fold does an in-place update - so either choice may potentially have little impact.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D76268/new/

https://reviews.llvm.org/D76268