[all-commits] [llvm/llvm-project] 03c2f5: [mlir][linalg][conv] Flatten the channel dimension...

Wed Dec 6 13:35:16 PST 2023

  Branch: refs/heads/main
  Home:   https://github.com/llvm/llvm-project
  Commit: 03c2f5d8bbcf31239a631d9343ac7f4b6b3094c1
      https://github.com/llvm/llvm-project/commit/03c2f5d8bbcf31239a631d9343ac7f4b6b3094c1
  Author: Andrzej Warzyński <andrzej.warzynski at arm.com>
  Date:   2023-12-06 (Wed, 06 Dec 2023)

  Changed paths:
    M mlir/include/mlir/Dialect/Linalg/TransformOps/LinalgTransformOps.td
    M mlir/include/mlir/Dialect/Linalg/Transforms/Transforms.h
    M mlir/lib/Dialect/Linalg/TransformOps/LinalgTransformOps.cpp
    M mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp
    A mlir/test/Dialect/Linalg/vectorize-convolution-flatten.mlir

  Log Message:
  -----------
  [mlir][linalg][conv] Flatten the channel dimension when vectorizing (#71918)

The current vectorization of 1D depthwise convolutions in Linalg is
_sub-optimal_ for tensor with a low number of channel dimensions, e.g.:

```mlir
linalg.depthwise_conv_1d_nwc_wc
    {dilations = dense<1> : vector<1xi64>,
    strides = dense<1> : vector<1xi64>}
    ins(%input, %filter : tensor<1x8x3xi8>, tensor<1x3xi8>)
    outs(%output : tensor<1x8x3xi8>) -> tensor<1x8x3xi8>
```

That's due to the fact that ultimately (i.e. at LLVM level),
vectorization happens along the trailing dimension (i.e. the channel
dimension). In this case it leads to vectors with 3 elements (or worse,
if there's e.g. only 1 channel dimension). For comparison, a 128 bit
wide vector registers can hold 16 x i8.

Instead, this patch adds an option to flatten/collapse the channel
dimension into the width dimension of the input/filter/output using
`vector.shape_cast` operation:

```mlir
    %sc_input = vector.shape_cast %input : vector<1x8x3xi8> to vector<1x24xi8>
    %sc_output = vector.shape_cast %output : vector<1x8x3xi8> to vector<1x24xi8>
    %b_filter = vector.broadcast %filter : vector<3xi8> to vector<1x8x3xi8>
    %sc_filter = vector.shape_cast %b_filter : vector<1x8x3xi8> to vector<1x24xi8>
```

This new vectorization mode is implemented in `depthwiseConv` by
inserting `vector.shape_cast` Ops before and after 
`depthwiseConv1dSliceAsMulAcc` is invoked. It can be selected through
e.g. a transform dialect attribute:

```mlir
  transform.structured.vectorize_children_and_apply_patterns %conv {flatten_1d_depthwise_conv}
```

A forthcoming patch will implement a strategy to automatically switch
between the two implementations, depending on the shape of the input
tensors.

Co-authored by: Bradley Smith <bradley.smith at arm.com>