[PATCH] D77320: [MLIR] fix/update affine data copy utility for max/min bounds

Sat Apr 4 00:19:42 PDT 2020

bondhugula marked 2 inline comments as done.
bondhugula added inline comments.

================
Comment at: mlir/lib/Analysis/Utils.cpp:106
+  // Use a copy of the region constraints that has upper/lower bounds for each
+  // memref dimension with static size added to guard against potential
+  // over-approximation from projection or union bounding box. We may not add
----------------
andydavis1 wrote:
> bondhugula wrote:
> > andydavis1 wrote:
> > > Have you run into over approximations again?
> > It's the same over approximation that existed - but I've changed  affine data copy generate to use /*addMemRefDimBounds=*/false with MemRefRegion::compute to prevent redundant bounds from being added for the common case. So, instead, this is adding them when getting the constant bounding size and shape, but not when we do getLowerAndUpperBound on that region to get the range for the copy loops. This basically means the code that does the copying now risks going out of bounds when there is overapproximation. Ultimately, we shouldn't be using approximation based projection at all for region computation, and instead work with the equalities/local expressions to keep the bounds accurate -- if that's not possible (due to yet unimplemented detection or complex cases we may not be interested in), the region computation should just fail and bail out. We have a similar over approximation with unionBoundingBox. This approximation shouldn't be done for write regions; we should bail out if we can't be exact in those case.
> > 
> > For this patch, we have two options: (1) we could keep it like this (use addMemRefDimBounds = false with region compute) and then work on getting rid of the use of project in region compute. Once that's done, we don't need to add memref dim bounds anywhere; (2) we addMemrefDimBounds = true for region computation  and update test cases because there'd be some redundant bounds. This still means we would later need to get rid of the over approximation (and fail instead) to avoid extra writes (which impact correctness) and extra reads (which may only impact performance). Let me know which one you prefer. 
> OK thanks. Yes, lets go with option (1). Do you need to make additional changes to this revision for option (1)?
This revision already does option (1). No more additional changes needed for it.

================
Comment at: mlir/lib/Dialect/Affine/Transforms/AffineDataCopyGeneration.cpp:276
+  AffineStoreOp::getCanonicalizationPatterns(patterns, &getContext());
+  applyPatternsGreedily(f, std::move(patterns));
 }
----------------
mehdi_amini wrote:
> bondhugula wrote:
> > dcaballe wrote:
> > > I'm not asking for any changes now but just wondering if it would make sense in the future to do all of these "clean-up" optimizations in a separate pass(es) that we can invoke as needed, maybe after running a bunch of optimizations instead of trying to optimize right after each one if it's not absolutely necessary. I guess that could reduce compile time and avoid duplicating this clean-up per pass. IIRC, loop fusion performed also some optimization around temporary tensors after fusion. Not sure if that optimizations would also fit into this category.  
> > Yes, this is an issue common to several passes - as do whether we want to do light weight cleanup at the end. If it's really simple canonicalizations, it should really have no impact on compile time (so long as you are doing only the necessary stuff). Its real benefit is that it makes the output of the pass more intuitive to read and test cases easier to write / more readable. One issue here is that the current greedy pattern rewriter would run folding and DCE on *all* ops irrespective of the patterns, and so we get all sorts of unexpected simplifications from the pass and in the test cases. I'm sending out a patch/proposal to add a flag to applyPatternsGreedily that makes it only run the supplied patterns and not do any folding/DCE. This is also needed when entering the pass/utility when you want to canonicalize things by selectively applying some patterns (instead of requiring the client to do it). It's not always feasible to check whether it's already in the canonical form - would require a lot of extra code.
> It is common that a pass would clean-up behind itself when it knows exactly what to cleanup: for example while you're promoting a single iteration loop you know that you may have specific code to clean in the promoted block and you perform these directly. 
> This is very targeted and "cheap".
> 
> Here is seems borderline though: it applies a some canonicalization patterns unconditionally at the function scope level.
> 
That's right. I'd like to ideally avoid doing function scope canonicalizations and only restrict this cleanup to load/store op's we touched (which can be easily collected and they are few in number - as many as the memrefs packed/copied) - but the current pattern rewriter doesn't support that. 

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D77320/new/

https://reviews.llvm.org/D77320