[Mlir-commits] [mlir] [mlir][tensor] Fix runtime verification for tensor.extract_slice for empty tensor slices (PR #166569)
llvmlistbot at llvm.org
llvmlistbot at llvm.org
Wed Nov 5 06:30:26 PST 2025
llvmbot wrote:
<!--LLVM PR SUMMARY COMMENT-->
@llvm/pr-subscribers-mlir-tensor
@llvm/pr-subscribers-mlir
Author: Hanumanth (Hanumanth04)
<details>
<summary>Changes</summary>
I hit another runtime verification issue while working with TFLite models. The verifier is incorrectly rejecting `tensor.extract_slice` operations when extracting an empty slice (size=0) that starts exactly at the tensor boundary.
The current runtime verification unconditionally enforces `offset < dim_size`. This makes sense for non-empty slices, but it's too strict for empty slices, causing false positives that lead to spurious runtime assertions.
**Simple example that demonstrates the issue:**
```mlir
func.func @<!-- -->extract_empty_slice(%tensor: tensor<?xf32>, %offset: index, %size: index) {
// When called with: tensor size=10, offset=10, size=0
// Runtime verification fails: "offset 0 is out-of-bounds"
%slice = tensor.extract_slice %tensor[%offset] [%size] [1]
: tensor<?xf32> to tensor<?xf32>
return
}
```
The check evaluates `10 < 10` which is false, so verification fails. However, I believe this operation should be valid - we're extracting zero elements, so there's no actual out-of-bounds access.
**Real-world repro from TensorFlow Lite model:**
This issue manifests in actual TFLite model lowering. Here's a simplified version showing the problematic pattern:
In this code, `%extracted_slice_0` becomes an empty tensor when SSA value `%15` reaches 10 (on the final loop iteration), making `%16 = 0`. The operation extracts zero elements along dimension 0, which is semantically valid but fails runtime verification.
```mlir
func.func @<!-- -->simplified_repro_from_tensorflowlite_model(%arg0: tensor<10x4x1xf32>) -> tensor<10x4x1xf32> {
%c0 = arith.constant 0 : index
%c1 = arith.constant 1 : index
%c2 = arith.constant 2 : index
%c10 = arith.constant 10 : index
%c-1 = arith.constant -1 : index
%0 = "tosa.const"() <{values = dense<0> : tensor<i32>}> : () -> tensor<i32>
%1 = "tosa.const"() <{values = dense<1> : tensor<i32>}> : () -> tensor<i32>
%2 = "tosa.const"() <{values = dense<10> : tensor<i32>}> : () -> tensor<i32>
%3 = "tosa.const"() <{values = dense<-1> : tensor<2xi32>}> : () -> tensor<2xi32>
%4 = "tosa.const"() <{values = dense<0> : tensor<2xi32>}> : () -> tensor<2xi32>
%5 = "tosa.const"() <{values = dense<0.000000e+00> : tensor<1x4x1xf32>}> : () -> tensor<1x4x1xf32>
%c4_1 = tosa.const_shape {values = dense<1> : tensor<1xindex>} : () -> !tosa.shape<1>
%6:2 = scf.while (%arg1 = %0, %arg2 = %arg0)
: (tensor<i32>, tensor<10x4x1xf32>) -> (tensor<i32>, tensor<10x4x1xf32>) {
%7 = tosa.greater %2, %arg1 : (tensor<i32>, tensor<i32>) -> tensor<i1>
%extracted = tensor.extract %7[] : tensor<i1>
scf.condition(%extracted) %arg1, %arg2 : tensor<i32>, tensor<10x4x1xf32>
} do {
^bb0(%arg1: tensor<i32>, %arg2: tensor<10x4x1xf32>):
%7 = tosa.add %arg1, %1 : (tensor<i32>, tensor<i32>) -> tensor<i32>
// First slice
%8 = tosa.reshape %arg1, %c4_1 : (tensor<i32>, !tosa.shape<1>) -> tensor<1xi32>
%9 = tosa.concat %8, %3 {axis = 0 : i32} : (tensor<1xi32>, tensor<2xi32>) -> tensor<3xi32>
%extracted_0 = tensor.extract %9[%c0] : tensor<3xi32>
%10 = index.casts %extracted_0 : i32 to index
%11 = arith.cmpi eq, %10, %c-1 : index
%12 = arith.select %11, %c10, %10 : index
%extracted_slice = tensor.extract_slice %arg2[0, 0, 0] [%12, 4, 1] [1, 1, 1]
: tensor<10x4x1xf32> to tensor<?x4x1xf32>
// Second slice - this is where the failure occurs
%13 = tosa.reshape %7, %c4_1 : (tensor<i32>, !tosa.shape<1>) -> tensor<1xi32>
%14 = tosa.concat %13, %4 {axis = 0 : i32} : (tensor<1xi32>, tensor<2xi32>) -> tensor<3xi32>
%extracted_1 = tensor.extract %14[%c0] : tensor<3xi32>
%15 = index.castu %extracted_1 : i32 to index
%16 = arith.subi %c10, %15 : index // size = 10 - offset
%extracted_2 = tensor.extract %14[%c1] : tensor<3xi32>
%17 = index.castu %extracted_2 : i32 to index
%extracted_3 = tensor.extract %14[%c2] : tensor<3xi32>
%18 = index.castu %extracted_3 : i32 to index
// On the last loop iteration: %15=10, %16=0
// %extracted_slice_0 becomes an empty tensor
// Runtime verification fails: "offset 0 is out-of-bounds"
%extracted_slice_0 = tensor.extract_slice %arg2[%15, %17, %18] [%16, 4, 1] [1, 1, 1]
: tensor<10x4x1xf32> to tensor<?x4x1xf32>
%19 = tosa.concat %extracted_slice, %5, %extracted_slice_0 {axis = 0 : i32}
: (tensor<?x4x1xf32>, tensor<1x4x1xf32>, tensor<?x4x1xf32>) -> tensor<10x4x1xf32>
scf.yield %7, %19 : tensor<i32>, tensor<10x4x1xf32>
}
return %6#<!-- -->1 : tensor<10x4x1xf32>
}
```
**Why I feel this should be allowed:**
TFLite's `tfl.slice` uses the rule: `offset + size <= dim_size`
For our case: `10 + 0 <= 10` is true, so TFLite accepts it.
Python/NumPy also allows this:
```python
arr = np.arange(10)
empty = arr[10:10] # Valid, returns []
```
Position `len(arr)` is a valid starting point for an empty slice - you're not actually reading anything, so the boundary position is safe.
**The fix:**
Make the offset check conditional on slice size:
- Empty slice (size == 0): allow `0 <= offset <= dim_size`
- Non-empty slice (size > 0): require `0 <= offset < dim_size`
**Question for reviewers:**
Should we also relax the static verifier to allow this edge case? Currently, the static verifier would reject:
```mlir
%tensor = arith.constant dense<1.0> : tensor<10xf32>
%slice = tensor.extract_slice %tensor[10] [0] [1] : tensor<10xf32> to tensor<0xf32>
```
Since we're allowing it at runtime for dynamic shapes, it seems inconsistent to reject it statically. However, I wanted to get feedback before making that change - this PR focuses only on the runtime verification fix for dynamic shapes.
P.S. We have a similar issue with `memref.subview`. I will send a separate patch for the issue.
---
Full diff: https://github.com/llvm/llvm-project/pull/166569.diff
2 Files Affected:
- (modified) mlir/lib/Dialect/Tensor/Transforms/RuntimeOpVerification.cpp (+52-33)
- (modified) mlir/test/Integration/Dialect/Tensor/extract_slice-runtime-verification.mlir (+9)
``````````diff
diff --git a/mlir/lib/Dialect/Tensor/Transforms/RuntimeOpVerification.cpp b/mlir/lib/Dialect/Tensor/Transforms/RuntimeOpVerification.cpp
index 753cb95b1c906..d35f458cbdb36 100644
--- a/mlir/lib/Dialect/Tensor/Transforms/RuntimeOpVerification.cpp
+++ b/mlir/lib/Dialect/Tensor/Transforms/RuntimeOpVerification.cpp
@@ -155,13 +155,15 @@ struct ExtractSliceOpInterface
RankedTensorType sourceType = extractSliceOp.getSource().getType();
// For each dimension, assert that:
- // 0 <= offset < dim_size
- // 0 <= offset + (size - 1) * stride < dim_size
+ // For empty slices (size == 0) : 0 <= offset <= dim_size
+ // For non-empty slices (size > 0): 0 <= offset < dim_size
+ // 0 <= offset + (size - 1) * stride <
+ // dim_size
Value zero = arith::ConstantIndexOp::create(builder, loc, 0);
Value one = arith::ConstantIndexOp::create(builder, loc, 1);
for (int64_t i : llvm::seq<int64_t>(0, sourceType.getRank())) {
- // Reset insertion point to before the operation for each dimension
+
builder.setInsertionPoint(extractSliceOp);
Value offset = getValueOrCreateConstantIndexOp(
@@ -170,46 +172,63 @@ struct ExtractSliceOpInterface
builder, loc, extractSliceOp.getMixedSizes()[i]);
Value stride = getValueOrCreateConstantIndexOp(
builder, loc, extractSliceOp.getMixedStrides()[i]);
-
- // Verify that offset is in-bounds.
Value dimSize = builder.createOrFold<tensor::DimOp>(
loc, extractSliceOp.getSource(), i);
- Value offsetInBounds =
- generateInBoundsCheck(builder, loc, offset, zero, dimSize);
- cf::AssertOp::create(builder, loc, offsetInBounds,
+
+ // Verify that offset is in-bounds (conditional on slice size).
+ Value sizeIsZero = arith::CmpIOp::create(
+ builder, loc, arith::CmpIPredicate::eq, size, zero);
+ auto offsetCheckIf = scf::IfOp::create(
+ builder, loc, sizeIsZero,
+ [&](OpBuilder &b, Location loc) {
+ // For empty slices, offset can be at the boundary: 0 <= offset <=
+ // dimSize.
+ Value offsetGEZero = arith::CmpIOp::create(
+ b, loc, arith::CmpIPredicate::sge, offset, zero);
+ Value offsetLEDimSize = arith::CmpIOp::create(
+ b, loc, arith::CmpIPredicate::sle, offset, dimSize);
+ Value emptyOffsetValid =
+ arith::AndIOp::create(b, loc, offsetGEZero, offsetLEDimSize);
+ scf::YieldOp::create(b, loc, emptyOffsetValid);
+ },
+ [&](OpBuilder &b, Location loc) {
+ // For non-empty slices, offset must be a valid index: 0 <= offset <
+ // dimSize.
+ Value offsetInBounds =
+ generateInBoundsCheck(b, loc, offset, zero, dimSize);
+ scf::YieldOp::create(b, loc, offsetInBounds);
+ });
+
+ Value offsetCondition = offsetCheckIf.getResult(0);
+ cf::AssertOp::create(builder, loc, offsetCondition,
generateErrorMessage(op, "offset " +
std::to_string(i) +
" is out-of-bounds"));
- // Only verify if size > 0
+ // Verify that the slice endpoint is in-bounds (only for non-empty
+ // slices).
Value sizeIsNonZero = arith::CmpIOp::create(
builder, loc, arith::CmpIPredicate::sgt, size, zero);
+ auto ifOp = scf::IfOp::create(
+ builder, loc, sizeIsNonZero,
+ [&](OpBuilder &b, Location loc) {
+ // Verify that slice does not run out-of-bounds.
+ Value sizeMinusOne = arith::SubIOp::create(b, loc, size, one);
+ Value sizeMinusOneTimesStride =
+ arith::MulIOp::create(b, loc, sizeMinusOne, stride);
+ Value lastPos =
+ arith::AddIOp::create(b, loc, offset, sizeMinusOneTimesStride);
+ Value lastPosInBounds =
+ generateInBoundsCheck(b, loc, lastPos, zero, dimSize);
+ scf::YieldOp::create(b, loc, lastPosInBounds);
+ },
+ [&](OpBuilder &b, Location loc) {
+ Value trueVal =
+ arith::ConstantOp::create(b, loc, b.getBoolAttr(true));
+ scf::YieldOp::create(b, loc, trueVal);
+ });
- auto ifOp = scf::IfOp::create(builder, loc, builder.getI1Type(),
- sizeIsNonZero, /*withElseRegion=*/true);
-
- // Populate the "then" region (for size > 0).
- builder.setInsertionPointToStart(&ifOp.getThenRegion().front());
-
- // Verify that slice does not run out-of-bounds.
- Value sizeMinusOne = arith::SubIOp::create(builder, loc, size, one);
- Value sizeMinusOneTimesStride =
- arith::MulIOp::create(builder, loc, sizeMinusOne, stride);
- Value lastPos =
- arith::AddIOp::create(builder, loc, offset, sizeMinusOneTimesStride);
- Value lastPosInBounds =
- generateInBoundsCheck(builder, loc, lastPos, zero, dimSize);
- scf::YieldOp::create(builder, loc, lastPosInBounds);
-
- // Populate the "else" region (for size == 0).
- builder.setInsertionPointToStart(&ifOp.getElseRegion().front());
- Value trueVal =
- arith::ConstantOp::create(builder, loc, builder.getBoolAttr(true));
- scf::YieldOp::create(builder, loc, trueVal);
-
- builder.setInsertionPointAfter(ifOp);
Value finalCondition = ifOp.getResult(0);
-
cf::AssertOp::create(
builder, loc, finalCondition,
generateErrorMessage(
diff --git a/mlir/test/Integration/Dialect/Tensor/extract_slice-runtime-verification.mlir b/mlir/test/Integration/Dialect/Tensor/extract_slice-runtime-verification.mlir
index a77fa310a3699..745eea37f7fca 100644
--- a/mlir/test/Integration/Dialect/Tensor/extract_slice-runtime-verification.mlir
+++ b/mlir/test/Integration/Dialect/Tensor/extract_slice-runtime-verification.mlir
@@ -39,6 +39,11 @@ func.func @extract_slice_zero_size_dim(%arg0: tensor<10x4x1xf32>, %dim_0: index,
return
}
+func.func @extract_slice_empty_tensor(%arg0: tensor<10x4x1xf32>, %dim_0: index, %dim_1: index, %dim_2: index, %offset: index) {
+ tensor.extract_slice %arg0[%offset, 0, 0] [%dim_0, %dim_1, %dim_2] [1, 1, 1] : tensor<10x4x1xf32> to tensor<?x?x?xf32>
+ return
+}
+
func.func @main() {
%0 = arith.constant 0 : index
@@ -115,5 +120,9 @@ func.func @main() {
%dim_2 = arith.constant 1 : index
func.call @extract_slice_zero_size_dim(%cst10x4x1xf32, %dim_0, %dim_1, %dim_2) : (tensor<10x4x1xf32>, index, index, index) -> ()
+ // CHECK-NOT: ERROR: Runtime op verification failed
+ %offset = arith.constant 10 : index
+ func.call @extract_slice_empty_tensor(%cst10x4x1xf32, %dim_0, %dim_1, %dim_2, %offset) : (tensor<10x4x1xf32>, index, index, index, index) -> ()
+
return
}
``````````
</details>
https://github.com/llvm/llvm-project/pull/166569
More information about the Mlir-commits
mailing list