[Mlir-commits] [mlir] [mlir][Linalg] Drop unit extent dim in non-trivial expressions (PR #173873)

Mon Dec 29 07:54:19 PST 2025

sommerlukas wrote:

To illustrate the changes from this PR more, the following example is useful:
```mlir
#map = affine_map<(d0, d1, d2, d3, d4, d5, d6) -> (d0, d4, d2 + d5, d3 + d6)>
#map1 = affine_map<(d0, d1, d2, d3, d4, d5, d6) -> (d1, d4, d5, d6)>
#map2 = affine_map<(d0, d1, d2, d3, d4, d5, d6) -> (d0, d1, d2, d3)>
module {
  func.func @drop_unit_dim_binary_expr(%arg0: tensor<1x61x1x1xf32>, %arg1: tensor<48x61x1x1xf32>, %arg2: tensor<1x48x1x1xf32>) -> tensor<1x48x1x1xf32> {
    %2 = linalg.generic {indexing_maps = [#map, #map1, #map2], iterator_types = ["parallel", "parallel", "parallel", "parallel", "reduction", "reduction", "reduction"]} ins(%arg0, %arg1 : tensor<1x61x1x1xf32>, tensor<48x61x1x1xf32>) outs(%arg2 : tensor<1x48x1x1xf32>) {
    ^bb0(%in: f32, %in_0: f32, %out: f32):
      %3 = arith.mulf %in, %in_0 : f32
      %4 = arith.addf %out, %3 : f32
      linalg.yield %4 : f32
    } -> tensor<1x48x1x1xf32>
    return %2 : tensor<1x48x1x1xf32>
  }
}
```

In this example, the unit extent dimensions of `%arg0` are indexed with non-trivial affine expressions, in this case `d2 + d5` and `d3 + d6`.

Prior to this change, a single application of the transformation would yield the following output:
```mlir
#map = affine_map<(d0, d1) -> (d1, 0, 0)>
#map1 = affine_map<(d0, d1) -> (d0, d1)>
#map2 = affine_map<(d0, d1) -> (d0)>
module {
  func.func @drop_unit_dim_binary_expr(%arg0: tensor<1x61x1x1xf32>, %arg1: tensor<48x61x1x1xf32>, %arg2: tensor<1x48x1x1xf32>) -> tensor<1x48x1x1xf32> {
    %collapsed = tensor.collapse_shape %arg0 [[0, 1], [2], [3]] : tensor<1x61x1x1xf32> into tensor<61x1x1xf32>
    %collapsed_0 = tensor.collapse_shape %arg1 [[0], [1, 2, 3]] : tensor<48x61x1x1xf32> into tensor<48x61xf32>
    %collapsed_1 = tensor.collapse_shape %arg2 [[0, 1, 2, 3]] : tensor<1x48x1x1xf32> into tensor<48xf32>
    %0 = linalg.generic {indexing_maps = [#map, #map1, #map2], iterator_types = ["parallel", "reduction"]} ins(%collapsed, %collapsed_0 : tensor<61x1x1xf32>, tensor<48x61xf32>) outs(%collapsed_1 : tensor<48xf32>) {
    ^bb0(%in: f32, %in_2: f32, %out: f32):
      %1 = arith.mulf %in, %in_2 : f32
      %2 = arith.addf %out, %1 : f32
      linalg.yield %2 : f32
    } -> tensor<48xf32>
    %expanded = tensor.expand_shape %0 [[0, 1, 2, 3]] output_shape [1, 48, 1, 1] : tensor<48xf32> into tensor<1x48x1x1xf32>
    return %expanded : tensor<1x48x1x1xf32>
  }
}
```

`%arg0`'s shape is collapsed to drop the first unit extent dimension, but the latter two are retained, even if they're indexed by constant `0` after the transformation (`#map`), which shows that those dimensions could be dropped.

Only a repeated application of the transformation yields the final result that entirely drops the unit extent dimensions of `%arg0` (see `%collapsed_1`).
```mlir
#map = affine_map<(d0, d1) -> (d1)>
#map1 = affine_map<(d0, d1) -> (d0, d1)>
#map2 = affine_map<(d0, d1) -> (d0)>
module {
  func.func @drop_unit_dim_binary_expr(%arg0: tensor<1x61x1x1xf32>, %arg1: tensor<48x61x1x1xf32>, %arg2: tensor<1x48x1x1xf32>) -> tensor<1x48x1x1xf32> {
    %collapsed = tensor.collapse_shape %arg1 [[0], [1, 2, 3]] : tensor<48x61x1x1xf32> into tensor<48x61xf32>
    %collapsed_0 = tensor.collapse_shape %arg2 [[0, 1, 2, 3]] : tensor<1x48x1x1xf32> into tensor<48xf32>
    %collapsed_1 = tensor.collapse_shape %arg0 [[0, 1, 2, 3]] : tensor<1x61x1x1xf32> into tensor<61xf32>
    %0 = linalg.generic {indexing_maps = [#map, #map1, #map2], iterator_types = ["parallel", "reduction"]} ins(%collapsed_1, %collapsed : tensor<61xf32>, tensor<48x61xf32>) outs(%collapsed_0 : tensor<48xf32>) {
    ^bb0(%in: f32, %in_2: f32, %out: f32):
      %1 = arith.mulf %in, %in_2 : f32
      %2 = arith.addf %out, %1 : f32
      linalg.yield %2 : f32
    } -> tensor<48xf32>
    %expanded = tensor.expand_shape %0 [[0, 1, 2, 3]] output_shape [1, 48, 1, 1] : tensor<48xf32> into tensor<1x48x1x1xf32>
    return %expanded : tensor<1x48x1x1xf32>
  }
}
```

Before #171796, this issue was hidden because the transformation was applied repeatedly with a greedy driver. With the changes from this PR, the transformation drops the unit extent dimensions on the first application, so we achieve the same result, but without repeated application.

https://github.com/llvm/llvm-project/pull/173873