[PATCH] D74954: Add a basic tiling pass for parallel loops

Fri Feb 21 06:24:07 PST 2020

herhut added a comment.

In D74954#1886199 <https://reviews.llvm.org/D74954#1886199>, @ftynse wrote:

> We already several implementations of loop tiling (lib/Transforms/LoopTiling.cpp, lib/Dialect/Linalg/Transforms/Tiling.cpp, lib/Transforms/Utils/LoopUtils.cpp). Have you considered generalizing them instead of introducing yet another one?  I suppose parallel loop nest as an operation removes the need for any supplementary preconditions, but the mechanics of the transformation should be very similar between parallel and sequential loops.

The complexity of the tiling passes is in the dependency analysis and not the rewriting itself. This one is purely structural. We would need to split the actual tiling rewrite form the analysis with interfaces for generating the tiled operation. I am not sure that is worth the code we would actually get to reuse.

================
Comment at: mlir/lib/Dialect/LoopOps/Transforms/ParallelLoopTiling.cpp:52
+  // Compute min(size, dim - offset) to avoid out-of-bounds accesses.
+  // FIXME: Instead of using min, we want to replicate the tail. This would give
+  // the inner loop constant bounds for easy vectorization.
----------------
The replication part could be a separate pattern/pass that rewrites a loop with dynamic upper bounds specified with a constant upper bound (so an affine.min with a constant) into something like

```
bound = affine.min(constant, other thing)
if (bound ==constant) {
  // loop with constant upper bound
} else {
  // loop with dynamic upper bound
}
```

Maybe this is good enough to make LLVM recognize the potential for vectorization. That rewrite could be applied independently.

================
Comment at: mlir/lib/Dialect/LoopOps/Transforms/ParallelLoopTiling.cpp:96
+  ParallelLoopTiling(const ParallelLoopTiling &) {} // tileSize is non-copyable.
+  explicit ParallelLoopTiling(int64_t tileSize) { this->tileSize = tileSize; }
+
----------------
Can we have an ArrayRef<unsigned> for tile sizes? So we can have different ones at different levels? I would suspect that we, for vectorization, always want to tile 1x...x1xN.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D74954/new/

https://reviews.llvm.org/D74954