[Mlir-commits] [mlir] [mlir][SCF] Deprecate `linalg::tileToForallOp` and `linalg::tileToForallOpUsingTileSizes` (PR #91878)

Fri May 31 00:06:11 PDT 2024

================
@@ -60,7 +70,117 @@ fillInterchangeVector(ArrayRef<int64_t> interchangeVector,
 // tileUsingSCF implementation.
 //===----------------------------------------------------------------------===//
 
-// Check if `stride` evenly divides the trip count `size - offset`.
+/// Verify the tile size options are set in a consistent manner.
+static LogicalResult
+verifyTileSizeOptions(RewriterBase &rewriter, Location loc,
+                      const scf::SCFTilingOptions &options) {
+  // Specifying number of tile is only supported on `scf.forall` op.
+  if (options.numThreadsComputationFunction &&
+      options.loopType != scf::SCFTilingOptions::LoopType::ForallOp) {
+    return rewriter.notifyMatchFailure(
+        loc, "number of tiles/threads can only by specified when loop type is "
+             "set to use `scf.forall`");
+  }
+
+  // If specified, check that the interchange vector is a permutation.
+  if (!options.interchangeVector.empty()) {
+    if (!isPermutationVector(options.interchangeVector)) {
+      return rewriter.notifyMatchFailure(
+          loc, "invalid interchange vector, not a permutation of the entire "
+               "iteration space");
+    }
+  }
+  return success();
+}
+
+/// Compute the tile sizes and num threads values passed in.
+static std::tuple<SmallVector<OpFoldResult>, SmallVector<OpFoldResult>>
+getTileSizes(RewriterBase &rewriter, TilingInterface op,
+             ArrayRef<Range> iterationDomain,
+             const scf::SCFTilingOptions &options) {
+  OpFoldResult zero = rewriter.getIndexAttr(0);
+  SmallVector<OpFoldResult> tileSizes, numThreads;
+  size_t numLoops = iterationDomain.size();
+
+  // Check whether the number of tiles to use is specified.
+  if (options.numThreadsComputationFunction) {
+    numThreads = options.numThreadsComputationFunction(rewriter, op);
+    numThreads.resize(numLoops, zero);
+
+    // If the number of tiles is also specified, use that.
+    if (options.tileSizeComputationFunction) {
+      tileSizes = options.tileSizeComputationFunction(rewriter, op);
+    } else {
+      // Compute the tile sizes from the iteration domain and number
+      // of tiles as follows
+      // - niters = ceilDiv(ub - lb, step)
+      // - tileSize = ceilDiv(niters, numThreads)
----------------
nicolasvasilache wrote:

My general thinking in such large PRs is to look out for potential causes of differences. 
The ideal scenario is when tests don't change(i.e. the PR is NFC) and the APIs don't change too drastically.
So what I am evaluating here is whether this is good to go despite changes to test files of whether something material was changed.

Re 1: we are trading off increase in API complexity for fewer duplicate ops. I think this is a good change and not materially different, in principle this LGTM. Side note, one way to effect such improvements without significant API changes is to pass an extra struct to memoize the quantities you already created and want to reuse. This minimizes changes and reduces the probability of error while getting you the same effect.

Beyond the API complexity increase / reduction of redundant IR, there may be a catch: I see this PR introduces `--cse` in one of the test. My immediate gut feeling is we are introducing more IR in other places. If `--cse` can be dropped then I think this first point is resolved.


Re 2: calculateTileOffsetsAndSizes has related logic in the same code location. Now the logic seems split across getTileOffsetAndSizes and getTileSizes/getUserTileSizesAndNumThreads. The rename to getUserTileSizesAndNumThreads helps. So it seems we are doing a  `getUserTileSizesAndNumThreads` before any loop is generated and later a refinemenent of getTileOffsetAndSizes within the body (passed via. lambda). The part that is putting all this within the same function is what was confusing I believe. Can we hoist out the lambdas as helper functions and reduce the complexity here? It was really not clear to me that we had an inline lambda to wrangle with: my recollections from the refactoring 6 months ago do not include this.


Re 3: this should not be in this PR IMO. This is a material change that looks important by itself (bug fixing), I don't think we should have it in the middle of such a big PR with so much code complexity.

https://github.com/llvm/llvm-project/pull/91878