[Mlir-commits] [mlir] [mlir][linalg] Fix for bias handling for Winograd (PR #110331)
llvmlistbot at llvm.org
llvmlistbot at llvm.org
Mon Sep 30 13:37:38 PDT 2024
================
@@ -837,9 +837,25 @@ Value outputTransform(RewriterBase &rewriter, Location loc, Value value,
Value widthOffset =
builder.create<affine::AffineApplyOp>(loc, affineMap, tileWIter);
+ // Handling bias.
+ Value prevVal =
+ extract2DDataFrom4D(builder, loc, args[0], NIter, FIter, heightOffset,
+ widthOffset, retRows, retCols,
+ /*loopNorFIdx=*/0,
+ /*loopCorFIdx=*/3, /*heightIdx=*/1,
+ /*widthIdx=*/2);
+ Value biasedVal =
+ builder
+ .create<linalg::AddOp>(
+ loc, prevVal.getType(), ValueRange{matmulRetValue, prevVal},
+ ValueRange{builder.create<tensor::EmptyOp>(
+ loc, llvm::cast<ShapedType>(prevVal.getType()).getShape(),
+ elementType)})
+ .getResult(0);
+
----------------
Max191 wrote:
It would be good to not generate lots of extra ops when possible. I see that having the `scalarFactor`s prevents using the init slice as the init value for the last matmul, but when there is no `scalarFactor`, it would be good to directly use the init slice as the out argument for the last matmul.
Also, in the case where there is a `scalarFactor`, the broadcast + mul + add could all be combined into a single linalg.generic. Something like:
```
%res = linalg.generic {
indexing_maps = [
affine_map<(d0, d1) -> (d0, d1)>,
affine_map<(d0, d1) -> ()>,
affine_map<(d0, d1) -> (d0, d1)>],
iterator_types = ["parallel", "parallel"]}
ins(%a, %b : tensor<...>, f32) outs(%init_slice : tensor<...>) {
^bb0(%in: f32, %in_0: f32, %out: i32):
%5 = arith.muli %in, %in_0 : f32
%6 = arith.addi %6, %out : f32
linalg.yield %6 : f32
} -> tensor<...>
```
https://github.com/llvm/llvm-project/pull/110331
More information about the Mlir-commits
mailing list