[PATCH] D76268: [MLIR] Fix op folding to not run pre-replace when not constant folding

Tue Mar 17 18:56:12 PDT 2020

bondhugula marked an inline comment as done.
bondhugula added inline comments.

================
Comment at: mlir/lib/Transforms/Utils/FoldUtils.cpp:92

-  // Otherwise, replace all of the result values and erase the operation.
+  // Constant folding succeeded. We will start replacing this op's uses and
+  // erase this op. Invoke the callback provided by the caller to perform any
----------------
rriddle wrote:
> rriddle wrote:
> > bondhugula wrote:
> > > rriddle wrote:
> > > > Hmmm, can we instead add a flag to the callback detailing if the op was just updated in-place? I'd say for canonicalization we still want to reconsider the original operands and the users of the operation even when it gets updated.
> > > There is a tradeoff to make here. If the op gets updated, but doesn't really fold to a constant: would  you really want to add its users and operands to the worklist? Although some of the original operands might have been removed and will have a lower use count, we won't be sure if that's true and for how many (because we won't know in what way the op got updated). So you risk adding more and more wasteful worklist items for that iteration. If those operand use values end up with say 0 uses, you'd anyway catch them later - either in that iteration or the next iteration since 'changed' would be set to true. 
> > > 
> > > In striking contrast, in the case of constant folding, you definitely know that all original operands are wiped out, and so it makes perfect sense to add all of them to the worklist as well as add users of its results to the worklist, which the existing code does. But there is something to think about for the cases where there isn't true folding (aka constand folding). As a concrete example, consider forOp bound canonicalization: here, dropping unused bound operands is being done as part of tryToFold and it returns true when that happens. Now, we don't need to add all of the original operands back to the worklist - they'll be caught in the worklist subsequently and a good subset of them are probably still on the operand list after the folding, which means it's wasted work for their pattern matches.
> > > 
> > > Similarly, adding the users of a result when you don't do a true fold but just an update might make things slower: the situations where the user is affected because the defining op has been updated (as opposed to folded) are even less common(?) (compared to the operand case). So you risk creating 0 work pattern matches in that iteration. And since we set 'changed' to already true, all of these ops would anyway get into the worklist again.
> > > 
> > > Let me know what you think - I wish we had some precise timing tests on a good mix of heavy use cases. But there aren't many ops where fold does an in-place update - so either choice may potentially have little impact.
> > > If those operand use values end up with say 0 uses, you'd anyway catch them later - either in that iteration or the next iteration since 'changed' would be set to true.
> > An extra iteration is always going to be more expensive. We should be trying to limit the number of iterations, each one involves adding *every* op to the worklist as well as additional(likely unnecessary in these cases) region simplifications.
> > 
> > > Now, we don't need to add all of the original operands back to the worklist 
> > You could try and say the same for every op, there is nothing special here. The number of operations that fold in-place is extremely small, and almost all of the ones I know of involve removing operands.
> > 
> > > Similarly, adding the users of a result when you don't do a true fold but just an update:
> > I don't view this one similarly. I could see this being avoided for plain in-place updates, but this is a stark contrast from the common case of in-place folds. 
> Though, given that there are so few cases of in-place updates ATM I'm fine with punting until we have a case with unnecessary additional iterations. 
>An extra iteration is always going to be more expensive. We should 
>be trying to limit the number of iterations, each one involves adding

Yes, but that extra iteration is anyway happening (irrespective of this change) because we are setting 'changed' to true with in-place updates as well! Did you mean to return failure() instead of success() in the case of in-place updates? Then, the extra iteration won't happen (but that won't be right).

>You could try and say the same for every op, there is nothing 
>special here. The number of operations that fold in-place is 

In the case of constant folds, the use count of each value involved in the operands reduces by exactly one - this may not happen for in-place updates and even if it happens, it may not for all its operands. Second, in the case of constant folds, the chances of users needing an update is much higher than in the case of users with in-place updates. So, the two scenarios are quite different. Again, w/ or w/o this patch, note that in both cases, we are setting 'changed' to true and so the extra iteration (which adds every op) will happen.

Let me know if I'm missing something here.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D76268/new/

https://reviews.llvm.org/D76268