[PATCH] D78327: [mlir][Linalg] Create a named batchmatmul op and pipe it through.

Fri Apr 17 21:16:45 PDT 2020

nicolasvasilache marked an inline comment as done.
nicolasvasilache added a comment.

The tablegen story is awful atm, suggestions most welcome.

================
Comment at: mlir/include/mlir/Dialect/Linalg/IR/LinalgNamedStructuredOpsSpec.tc:2
+ods_def<BatchMatmulOp>:
+def batch_matmul(A: f32(Batch, M, K), B: f32(K, N)) -> (C: f32(Batch, M, N)) {
+  C(b, m, n) = std_addf<k>(std_mulf(A(b, m, k), B(k, n)));
----------------
nicolasvasilache wrote:
> silvas wrote:
> > mravishankar wrote:
> > > What is the reference for this specification ? ONNX/TF both seem to have a batch dimension for B as well. Without that this is effectively broadcasting B
> > This isn't enough to legalize e.g. tf.BatchMatMul or torch.matmul, which allow leading batch dimensions on both sides.
> > 
> > https://www.tensorflow.org/api_docs/cc/class/tensorflow/ops/batch-mat-mul
> > https://pytorch.org/docs/stable/torch.html#torch.matmul
> > 
> > In IREE we have a batch matmul op that handles batch on both sides:
> > https://github.com/google/iree/blob/f80f39c7e96c2af15741e9c774eb8b54bf38df28/iree/compiler/Dialect/VMLA/IR/VMLAOps.td#L323
> > 
> > I expect that in a typical lowering flow, we will legalize tf.BatchMatMul or torch.matmul by reshaping all the batch dimensions into a single dimension on both sides (possibly a dummy "1" dimension in case of no batch on one side). Then we can expand this op into generic form and fuse/cleanup those reshapes which will eliminate batch dimensions on either side.
> > 
> > I don't see a situation where we would create this op. 
> > 
> > My intuition is that batch matmul with a batch dimension only on one side is not that interesting, because fundamentally it is the same as a regular matmul, because you just fold the batch dimension into the free dimension of the respective operand (e.g. in the case you have here, you can just reshape the two dimensions Batch,M in the LHS into a single dimension of extent Batch*M). Batch matmul is only interesting from a lowering perspective when you have a batch dimension on both sides, which introduces a distinct data-reuse behavior as compared to a normal matmul.
> > 
> > So in terms of defining a set of "primitives" or lowering to library calls (e.g. https://devblogs.nvidia.com/cublas-strided-batched-matrix-multiply/), having a batch on both sides seems to be the only relevant case. So I would recommend defining this as:
> > ```
> > def batch_matmul(A: f32(Batch, M, K), B: f32(Batch, K, N)) -> (C: f32(Batch, M, N)) {
> > ```
> It's just something to get started, the semantics are variadic and will require extensions.
> 
> Once these are implemented it will be easy to update.
> If you have a strong preference for another op, let me know what you'd prefer (an op that also exercises reduction).
> It can't be dot/matvec/matmul for now because that's already taken and more work is needed to replace them.
I went with @silvas ' suggestion, we can iterate on the semantics later once we have variadic support.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D78327/new/

https://reviews.llvm.org/D78327