[all-commits] [llvm/llvm-project] e9bafa: [mlir][tensor] Generalize/restrict `GeneralizeOute...

Wed Nov 6 12:43:09 PST 2024

  Branch: refs/heads/main
  Home:   https://github.com/llvm/llvm-project
  Commit: e9bafa35d27042f8e1daa4ccf4a30bddf31878e8
      https://github.com/llvm/llvm-project/commit/e9bafa35d27042f8e1daa4ccf4a30bddf31878e8
  Author: Andrzej Warzyński <andrzej.warzynski at arm.com>
  Date:   2024-11-06 (Wed, 06 Nov 2024)

  Changed paths:
    M mlir/include/mlir/Dialect/Linalg/Transforms/Transforms.h
    M mlir/include/mlir/Dialect/Utils/StaticValueUtils.h
    M mlir/lib/Dialect/Linalg/Transforms/Transforms.cpp
    M mlir/lib/Dialect/Utils/StaticValueUtils.cpp
    M mlir/test/Dialect/Linalg/generalize-tensor-pack.mlir

  Log Message:
  -----------
  [mlir][tensor] Generalize/restrict `GeneralizeOuterUnitDimsPackOpPattern` (#114315)

This PR *restricts* `GeneralizeOuterUnitDimsPackOpPattern` to follow its
intended purpose (as per the documentation), which is to:

  > require all outer dimensions of tensor.pack to be 1.

There was one in-tree test that violated this assumption (and happened
to work) – see `@simple_KCRS_to_KRSCsr` in
"generalize-tensor-pack.mlir". That test has been updated to satisfy the
new requirements of the pattern.

By enforcing the pattern to follow its intended design (i.e., making it
stricter), the calculation of shapes and sizes for various Ops that the
pattern generates (PadOp, ExtractSliceOp, EmptyOp, TensorOp, and
InsertSliceOp) becomes much simpler and easier to document. This also
helped *generalize* the pattern to support cases like the one below:

```mlir
func.func @simple_pad_and_pack_dynamic_tile_cst(
    %src: tensor<5x1xf32>,
    %dest: tensor<1x1x?x2xf32>,
    %pad: f32) -> tensor<1x1x?x2xf32> {

  %tile_dim_0 = arith.constant 8 : index
  %0 = tensor.pack %src
    padding_value(%pad : f32)
    inner_dims_pos = [0, 1]
    inner_tiles = [%tile_dim_0, 2]
    into %dest : tensor<5x1xf32> -> tensor<1x1x?x2xf32>

  return %0 : tensor<1x1x?x2xf32>
}
```

Note that the inner tile slice is dynamic but compile-time constant.
`getPackOpSourceOrPaddedSource`, which is used to generate PadOp,
detects this and generates a PadOp with static shapes. This is a good
optimization, but it means that all shapes/sizes for Ops generated by
`GeneralizeOuterUnitDimsPackOpPattern` also need to be updated to be
constant/static. By restricting the pattern and simplifying the
size/shape calculation, supporting the case above becomes much easier.

Notable implementation changes:

* PadOp processes the original source (no change in dimensions/rank).
  ExtractSliceOp extracts the tile to pack and may reduce the rank. All
  following ops work on the tile extracted by ExtractSliceOp (possibly
  rank-reduced).
* All shape/size calculations assume that trailing dimensions match
  inner_tiles from tensor.pack. All leading dimensions (i.e., outer
  dimensions) are assumed to be 1.
* Dynamic sizes for ops like ExtractSliceOp are taken from inner_tiles
  rather than computed as, for example, tensor.dim %dest, 2. It’s the
  responsibility of the "producers" of tensor.pack to ensure that
  dimensions in %dest match the specified tile sizes.

To unsubscribe from these emails, change your notification settings at https://github.com/llvm/llvm-project/settings/notifications