[Mlir-commits] [mlir] [mlir] [linalg] Add pattern to swap transpose with broadcast (PR #97063)

Fri Jul 19 07:33:15 PDT 2024

================
@@ -51,6 +56,60 @@ Some important things to think about w.r.t. canonicalization patterns:
 *   It is always good to eliminate operations entirely when possible, e.g. by
     folding known identities (like "x + 0 = x").
 
+*   Canonicalize isn't a great place to put pattens with expensive compile time
+    (i.e. have O(n) complexity) or complicated cost models.
+
+*   Canonicalize shouldn't drop the semantic of original operation.
+
+For example, a pattern that transform
+
+```
+  %res = vector.transpose %0, [1, 0] : vector<nx1x<eltty>> to vector<1xnx<elty>>
+```
+
+to
+
+```
+  %res = vector.shape_cast %0 : vector<nx1x<eltty>> to vector<1xnx<elty>>
+```
+
+is not a good canonicalize pattern because it drops the transpose semantic.
+
+
+A pattern that transform (linalg.transpose is only use of %broadcast)
+
+```
+  %broadcast = linalg.broadcast
+      ins(%input : tensor<2x4x5xf32>)
+      outs(%init1 : tensor<1x2x3x4x5x6xf32>)
+      dimensions = [0, 2, 5]
+  %transpose = linalg.transpose
+      ins(%broadcast : tensor<1x2x3x4x5x6xf32>)
+      outs(%init2 : tensor<1x6x2x3x5x4xf32>)
+      permutation = [0, 5, 1, 2, 4, 3]
+```
+
+to
+
+```
+  %tranpose = linalg.transpose
+      ins(%input : tensor<2x4x5xf32>)
+      outs(%tmp_init : tensor<2x5x4xf32>)
+      permutation = [0, 2, 1]
+  %broadcast = linalg.broadcast
+      ins(%transpose : tensor<2x5x4xf32>)
+      outs(%init2 : tensor<1x6x2x3x5x4xf32>)
+      dimensions = [0, 3, 1]
+```
+
+is a good canonicalize pattern because:
+
+1. This pattern is converge.
+2. This pattern always transforms the program towards reducing the amount of
+   computational data, which is a clear lattice.
+3. This is not a one-off pattern, new matches may be generated during the
----------------
stellaraccident wrote:

Good q. I think I've seen people using this two ways: a. Not part of a holistic approach to reducing all forms of the implicated ops, and b. Does not benefit from the overhead of being run in a fix point loop (often to say this is a lowering ala dialect conversion).

In this case, I believe the criticism was (a) but the author may have thought (b). 

It's not that it isn't related to any other canonicalization but that it isn't part of a holistic design.

I would probably split this into two:

- it is part of a holistic, consistent design for related forms.
- it is not a lowering that would be better maintained in a library of one off patterns that can be included in such pipelines.

https://github.com/llvm/llvm-project/pull/97063