<table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Issue</th>
<td>
<a href=https://github.com/llvm/llvm-project/issues/57227>57227</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>
[MLIR] foreach tiling miscompiles when output comes from extract.slice
</td>
</tr>
<tr>
<th>Labels</th>
<td>
new issue
</td>
</tr>
<tr>
<th>Assignees</th>
<td>
</td>
</tr>
<tr>
<th>Reporter</th>
<td>
ThomasRaoux
</td>
</tr>
</table>
<pre>
Here is an example:
```
module {
func.func @matmul(%A: tensor<4xf32>, %B: tensor<16xf32>) -> tensor<4xf32> {
%B1 = tensor.extract_slice %B[10] [4] [1] : tensor<16xf32> to tensor<4xf32>
%result = linalg.generic {indexing_maps = [
affine_map<(d0) -> (d0)>,affine_map<(d0) -> (d0)>],
iterator_types = ["parallel"]}
ins(%A : tensor<4xf32>) outs(%B1 : tensor<4xf32>) {
^bb0(%arg3: f32, %arg4: f32): // no predecessors
%2 = arith.addf %arg3, %arg3 : f32
linalg.yield %2 : f32
} -> tensor<4xf32>
return %result : tensor<4xf32>
}
transform.with_pdl_patterns {
^bb0(%arg0: !pdl.operation):
transform.sequence %arg0 failures(propagate) {
^bb1(%arg1: !pdl.operation):
%0 = transform.structured.match ops{["linalg.generic"]} in %arg1
%1:2 = transform.structured.tile_to_foreach_thread_op %0 num_threads [2] (mapped to dims [0])
}
}
}
```
running with `mlir-opt --test-transform-dialect-interpreter -canonicalize file.mlir` gives:
```
#map0 = affine_map<(d0) -> (d0 * 2)>
#map1 = affine_map<(d0) -> (d0 * 2 + 10)>
#map2 = affine_map<(d0) -> (d0)>
module {
func.func @matmul(%arg0: tensor<4xf32>, %arg1: tensor<16xf32>) -> tensor<4xf32> {
%c2 = arith.constant 2 : index
%0 = tensor.extract_slice %arg1[10] [4] [1] : tensor<16xf32> to tensor<4xf32>
%1 = scf.foreach_thread (%arg2) in (%c2) -> (tensor<4xf32>) {
%2 = affine.apply #map0(%arg2)
%3 = tensor.extract_slice %arg0[%2] [2] [1] : tensor<4xf32> to tensor<2xf32>
%4 = affine.apply #map1(%arg2)
%5 = tensor.extract_slice %arg1[%4] [2] [1] : tensor<16xf32> to tensor<2xf32>
%6 = linalg.generic {indexing_maps = [#map2, #map2], iterator_types = ["parallel"]} ins(%3 : tensor<2xf32>) outs(%5 : tensor<2xf32>) {
^bb0(%arg3: f32, %arg4: f32):
%7 = arith.addf %arg3, %arg3 : f32
linalg.yield %7 : f32
} -> tensor<2xf32>
scf.foreach_thread.perform_concurrently {
tensor.parallel_insert_slice %6 into %0[%4] [2] [1] : tensor<2xf32> into tensor<4xf32>
}
} {thread_dim_mapping = [0]}
return %1 : tensor<4xf32>
}
}
```
The offset of `tensor.parallel_insert_slice` should have been `%arg2 * 2` but we get `%arg2 * 2 + 10` which causes a miscompile.
This is happening because the foreachthread tiling code assumes the interface generates an ExtractSliceOp for the output operand:
https://github.com/llvm/llvm-project/blob/750ee8d56d0a9f8e93c032ad55841c46b1fbbbd7/mlir/lib/Dialect/Linalg/Transforms/Tiling.cpp#L343
There is always an `ExtractSliceOp` but it may get merged with a producer ExtractSliceOp op if it exists:
https://github.com/llvm/llvm-project/blob/750ee8d56d0a9f8e93c032ad55841c46b1fbbbd7/mlir/lib/Dialect/Linalg/Utils/Utils.cpp#L349
I don't see an obvious solution as I don't see how to re-create those offsets within the general tiling function. One way to do it would be to stop composing the ExtractSliceOp but this wouldn't be very robust.
@nicolasvasilache @ftynse any thoughts?
</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJzNV1mP2zYQ_jX2C2FBh2XLD37YZBM0QIoAbfpsUNLIYkGRKkntrvvrO0NJtuVjs4vkoYahg5yLc32jXJeH7W9ggAnLuGLwwptWwix5mIWPs_BhtgqHv39tdNlJYLP1h_6dsapTRUAXNluGDXdNJ2dxNovTB5TBHCirzSz5uHypkniWfJrFHxlufphsRqvj7oYt8H7Nd66SeQkRmyWPA2EAL87wwu2sFAX0CpAinKWPDB-Wwz3y95uamdM3jD1XaMB20nmlUigu98EeFBhRkGlClfAi1H7X8NZ6GlR3YmeMV5VQQNsoHx1Uhsezjm-9d95ImD4i7bkC4cBwp83OHVo4mRDHLTdcSsCoxMS1fpxwKTtEi90J14bpzg1E3ud3qCbxQYeln_I87Nm42SfER8R9-HFleVzZ0BOufsY_U5q1BkoowKIOey7SRyH2J-NGuDrgZVmxUf5RcMJGyRPeIWYHAbIcBV2QoW_uZN-JxoDrjJrkwy2HDPRHb48LmKXKVto0wTOeYNeWctdyh6FT9tyDF94LScksjpA80C3FWWjVO-5k2Um0hX86UH0dEDeruJAd2osCW6NbvucOrmLmdUZHndFbdPqQhH0hntQ70xXoJSgD7AdFzXRrSZPPxmnpHHMSE5GNeqfSyY74vgYnJOyc3uE68KLeuRrv5U63vWWqa4YlS-UQ-wYQZ1hdLZRU86Vo_E7oK2ozSYbrKB4fpl2xv5pOKWwBjCLLcK-Rwix069hi4cC6xdH-RSm4hMIthMLIY7rjlS0KrrQSBZfiX2AVHisgASiH7cUTxu7YkW9aECd4pj4SP-ogeH1g8dBHzrijd3Dj9QOLwmsh8ZuEnPO9A1LGUriHKmPa_hSwFOctptDKOq4c69uFb_MT6vBVFPIG_Wog6uNkiyqYJj07OomC21dU5g907vw3Ne9jn_VxDLBa5IENOTbRcsGV_Mgboe8CaTz4Ib7rj-Utd8RX7vBal_dsjV6zNX1L5Ej6D229Hbs7xq7eNUH0NdWnd__ogf8daH-C-GRqdXwT4tP7RD-H8FdIvv5VSL6-RXaN5bficV1EAUId9egdVn7RGQPKUTpNj87GtBm9vUMngzlLnxW6HZOBGsRbs2i0r-d8pQVMsKk_Kdo3AB_iGWVQS0A0ZEV4OfadZph789z7gO97DUxXlQWHN0K-17xDgGZr3WHsav4ELAdQxDPU6QBOSJR3jj0D26PUy-0RfZDquRY4YRS8s1gGnDXCFrppCT2nFuLXDf5rwn0P0jl4HubQ9iEDhi6KEwURFLoExq3tGhRMVB6sK16QSYqqD_wH06e-c_xJZ_vWkixPjRXV4gH85KTKY_7XzrUeyv20u8fc73IEmQZfpHwabwuc0v7GAQFfc6lzvK3TECAr01UZ8k2VwSYpwiTmZZpmy6hYrvKoyvO8XCOpnxpQjiC-x37SwKevvmrw4fs4hlh68YcNirbF_vI1WSYXcR2-C-UzP_jTosunBx4jJRxr-MFHqwGzx-HKj0Ecp3nE9wIHnAtH4YQmKmLDrmed_T956C9MATveT77ZnPvmCytpHF47ZgHIMzp_ErqzzGrZ0aiMucOmRLV-JoAwsCgw0RzlnrZj5VjvL8RsSp4-weSYijQKkciAfVPAMBR-cNXkvGdfSDnQinXoU8p-bYmLBF34nALlqBQ8W28Z8j6BOTCj8866SdXg9IUDqZbcPnErJJYI0ERWuQPWM575QCfo9jVF7zObwzZarcIsycJwOS-3SblJNnzuhJOwxTb0-9cvf1DTG6ptPNypYtGsGnvBUDm4iCuV0Q0bwDnw_WPeGbl9d5IILGT6_vmcruN4Pa-3K_wlfBWGiB9FGaX5Mi-yZRlnGVQQ5flc8hyk3faoquCZeRE9rM7FNg7jOMyiLErTJF4GZVzmURaXq5Bn4aoq0U3Q4FdXQHYE2uznZutNyru9xU1JOX_axD4j9gq8n0g-79CzZvu91g23f3Ddvcy9-q03_z_hEiAs">