[PATCH] D77320: [MLIR] fix/update affine data copy utility for max/min bounds

Sun Apr 5 12:16:53 PDT 2020

dcaballe added inline comments.

================
Comment at: mlir/lib/Dialect/Affine/Transforms/AffineDataCopyGeneration.cpp:276
+  AffineStoreOp::getCanonicalizationPatterns(patterns, &getContext());
+  applyPatternsGreedily(f, std::move(patterns));
 }
----------------
bondhugula wrote:
> mehdi_amini wrote:
> > bondhugula wrote:
> > > dcaballe wrote:
> > > > I'm not asking for any changes now but just wondering if it would make sense in the future to do all of these "clean-up" optimizations in a separate pass(es) that we can invoke as needed, maybe after running a bunch of optimizations instead of trying to optimize right after each one if it's not absolutely necessary. I guess that could reduce compile time and avoid duplicating this clean-up per pass. IIRC, loop fusion performed also some optimization around temporary tensors after fusion. Not sure if that optimizations would also fit into this category.  
> > > Yes, this is an issue common to several passes - as do whether we want to do light weight cleanup at the end. If it's really simple canonicalizations, it should really have no impact on compile time (so long as you are doing only the necessary stuff). Its real benefit is that it makes the output of the pass more intuitive to read and test cases easier to write / more readable. One issue here is that the current greedy pattern rewriter would run folding and DCE on *all* ops irrespective of the patterns, and so we get all sorts of unexpected simplifications from the pass and in the test cases. I'm sending out a patch/proposal to add a flag to applyPatternsGreedily that makes it only run the supplied patterns and not do any folding/DCE. This is also needed when entering the pass/utility when you want to canonicalize things by selectively applying some patterns (instead of requiring the client to do it). It's not always feasible to check whether it's already in the canonical form - would require a lot of extra code.
> > It is common that a pass would clean-up behind itself when it knows exactly what to cleanup: for example while you're promoting a single iteration loop you know that you may have specific code to clean in the promoted block and you perform these directly. 
> > This is very targeted and "cheap".
> > 
> > Here is seems borderline though: it applies a some canonicalization patterns unconditionally at the function scope level.
> > 
> That's right. I'd like to ideally avoid doing function scope canonicalizations and only restrict this cleanup to load/store op's we touched (which can be easily collected and they are few in number - as many as the memrefs packed/copied) - but the current pattern rewriter doesn't support that. 
Agree! I like the idea of doing some trivial simplifications (1-it loop) as long as it is on code that the pass is generating/modifying and it's not too involved to do so. However, "trivial" and "involved" are a bit subjective terms and I think this is one of those things that will get convoluted over time if we don't keep it really low profile. My personal opinion is that between simplicity and convenience, I lean towards the former and I would do this only for really trivial cases. I would say that  bookkeeping operations over the algorithm to be simplified at the end for convenience is a bit borderline for me. Of course, this is my personal opinion, totally arguable :).

I think we should also not simplify anything if the pass doesn't do any core transformation. Otherwise, that might create a dependency between the simplifications done by the pass and subsequent ones. Should we limit the load/store simplification to cases where there is a least a copyNest?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D77320/new/

https://reviews.llvm.org/D77320