[PATCH] D80772: [mlir] [VectorOps] Use 'vector.flat_transpose' for 2-D 'vector.tranpose'

Fri May 29 03:13:30 PDT 2020

nicolasvasilache added inline comments.

================
Comment at: mlir/include/mlir/Dialect/Vector/VectorOps.h:59
   VectorContractLowering vectorContractLowering = VectorContractLowering::FMA;
+  bool vectorUseFlatTranspose = false;
 };
----------------
Can we use a `struct VectorTransposeLowering` to keep it consistent with the one above (even if for now there are really only 2 options atm) ?

================
Comment at: mlir/test/Dialect/Vector/vector-flat-transforms.mlir:5
+//
+// TODO(ajcbik,ntv): having ShapeCastOp2DDownCastRewritePattern and
+//                   ShapeCastOp2DUpCastRewritePattern too early in
----------------
aartbik wrote:
> sending this out for some early discussion; this hits a phase ordering issue
> 
> I propose to move the shapecast lowering a bit later, so we can fold them first and only lower them when they cannot be eliminated
I think this should be resolved in a separate revision and this is fine for now.

Note that this is more an order of visitation problem. Given:
```
%a = shape_cast %0
%b = shape_cast %a
```
where the shape casts can fold, if %a is visited before %b then it will be expanded.

I have seen this type of behavior a bunch of times in different place (albeit not involving folding + canonicalization + lowering IIRC).

Seems like ShapeCast should have a canonicalizer / canonicalization pattern (`hasCanonicalizer=1`) with a separate match and rewrite.
Then ShapeCastLowering could query that on all its uses and fail to lower if any foldable use is left.

In other words, this type of pattern ordering can be resolved by finer-grained case disjunction.
However this seems like it can still "miss folding at a distance": consider a chain of transposes that are lowered in some arbitrary order introducing reshapes. 
Folding opportunities would only appear if consecutive transposes are lowered before any newly introduced shape_cast is visited.
This seems like the worklist-based algorithm would handle this ordering naturally but I imagine we can construct more intricate cases where this would not be true? 

Pinging @rriddle to see if there are more idiomatic ways of doing this, if this should be integrated in the rewriter itself (i.e. delay pattern application if any operand has folding opportunities), or something else.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D80772/new/

https://reviews.llvm.org/D80772