[PATCH] D74342: [mlir] Add MemRef filter to affine data copy optimization

Tue Feb 11 16:20:43 PST 2020

dcaballe added a comment.

Thanks for the feedback!

================
Comment at: mlir/include/mlir/Transforms/LoopUtils.h:175
 /// position), all of the fast memory capacity is assumed to be available for
 /// processing this block range.
 uint64_t affineDataCopyGenerate(Block::iterator begin, Block::iterator end,
----------------
bondhugula wrote:
> Doc comment update needed for filter memref.
Thanks Uday! I had added doc to the cpp file. I'll bring both docs into sync.

================
Comment at: mlir/include/mlir/Transforms/LoopUtils.h:228
+void gatherLoops(Block *block, unsigned currLoopDepth,
+                 DenseMap<unsigned, SmallVector<AffineForOp, 2>> &depthToLoops);
+
----------------
mehdi_amini wrote:
> Seems like a private implementation for the one below, do we need to expose it publicly?
> The API is leaking the fact that this is recursion helper...
Good point! Thanks!

================
Comment at: mlir/lib/Transforms/Utils/LoopUtils.cpp:1636
     if (auto loadOp = dyn_cast<AffineLoadOp>(opInst)) {
-      if ((loadOp.getMemRefType().getMemorySpace() !=
+      if ((filterMemRef.hasValue() &&
+           filterMemRef.getValue() != loadOp.getMemRef()) ||
----------------
rriddle wrote:
> You can just do `filterMemRef != loadOp.getMemRef()`
We need the `hasValue` part. Otherwise, the condition would be true when `filterMemRef` has no value. I removed the `getValue` call, though :) Thanks!

================
Comment at: mlir/lib/Transforms/Utils/LoopUtils.cpp:1796-1797
+    }
+  }
+}
+
----------------
mehdi_amini wrote:
> DenseMap is a surprising choice to me for a container that will be filled with monotonic indices starting at zero...
> Can this be just a SmallVectorImpl<SmallVector<AffineForOp, 2>> ?
> 
> I think you just need to add this:
>   if (currLoopDepth >= depthToLoops.size()) depthToLoops.resize(currLoopDepth);
> 
> at the beginning of the function and change nothing else.
> 
I forgot to ask the same question to @andydavis1! I thought that maybe there was a use case that I wasn't taking into account. Even if we wanted to update the container after a loop transformation, we would be adding or removing loop levels to/from the back. ?

Another comment about the current implementation is that it adds an extra "empty" level. When we process the innermost loop and we do:
```
auto &loopsAtDepth = depthToLoops[currLoopDepth];
``` 
we create a new level that won't contain any loop.

I'll take care of both things if it makes sense to you.

> if (currLoopDepth >= depthToLoops.size()) depthToLoops.resize(currLoopDepth);
Not sure I follow. Wouldn't this condition be always false and cause a realloc every time we add a new level?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D74342/new/

https://reviews.llvm.org/D74342