[Mlir-commits] [mlir] [MLIR][SCF] Fix loopUnrollByFactor for unsigned loops with narrow integer types (PR #189001)
llvmlistbot at llvm.org
llvmlistbot at llvm.org
Fri Mar 27 06:53:27 PDT 2026
llvmbot wrote:
<!--LLVM PR SUMMARY COMMENT-->
@llvm/pr-subscribers-mlir-scf
Author: Mehdi Amini (joker-eph)
<details>
<summary>Changes</summary>
`loopUnrollByFactor` used `getConstantIntValue()` to read loop bounds, which sign-extends the constant to `int64_t`. For unsigned `scf.for` loops with narrow integer types (e.g. i1, i2, i3), this produces wrong results: a bound such as `1 : i1` has `getSExtValue() == -1` but should be treated as `1` (unsigned).
Two bugs were introduced by this:
1. **Wrong epilogue detection**: the comparison `upperBoundUnrolledCst < ubCst` used signed int64, so e.g. `0 < -1` (where ubCst is the sign-extended i1 value 1) evaluated to false, suppressing the epilogue that should execute the remaining iterations.
2. **Zero step after overflow**: when `tripCountEvenMultiple == 0` (all iterations go to the epilogue), `stepUnrolledCst = stepCst * unrollFactor` can overflow the bound type's bitwidth and wrap to 0. A zero step causes `constantTripCount` to return `nullopt`, preventing the zero-trip main loop from being elided.
Fix:
- Use zero-extension (`getZExtValue`) instead of sign-extension when reading bounds for unsigned loops.
- When `tripCountEvenMultiple == 0`, keep the original step for the main loop to avoid the zero-step issue (the step value is irrelevant for a zero-trip loop anyway).
Fixes #<!-- -->163743
Assisted-by: Claude Code
---
Full diff: https://github.com/llvm/llvm-project/pull/189001.diff
2 Files Affected:
- (modified) mlir/lib/Dialect/SCF/Utils/Utils.cpp (+24-5)
- (modified) mlir/test/Dialect/SCF/loop-unroll.mlir (+112)
``````````diff
diff --git a/mlir/lib/Dialect/SCF/Utils/Utils.cpp b/mlir/lib/Dialect/SCF/Utils/Utils.cpp
index bae6e68400e22..eb5c40da13985 100644
--- a/mlir/lib/Dialect/SCF/Utils/Utils.cpp
+++ b/mlir/lib/Dialect/SCF/Utils/Utils.cpp
@@ -387,9 +387,24 @@ FailureOr<UnrolledLoopInfo> mlir::loopUnrollByFactor(
std::optional<APInt> constTripCount = forOp.getStaticTripCount();
if (constTripCount) {
// Constant loop bounds computation.
- int64_t lbCst = getConstantIntValue(forOp.getLowerBound()).value();
- int64_t ubCst = getConstantIntValue(forOp.getUpperBound()).value();
- int64_t stepCst = getConstantIntValue(forOp.getStep()).value();
+ bool isUnsignedLoop = forOp.getUnsignedCmp();
+ // For unsigned loops, the bounds must be zero-extended rather than
+ // sign-extended: narrow integer types (e.g. i1, i2, i3) can have bit
+ // patterns representing valid unsigned values but negative in a signed
+ // context (e.g., i1 value 1 has getSExtValue() == -1 but
+ // getZExtValue() == 1). This is safe as long as the unsigned bound fits
+ // in int64_t, which holds for all narrow integer types (< 64 bits).
+ auto getLoopBound = [&](Value v) -> int64_t {
+ auto apInt = getConstantAPIntValue(v);
+ assert(apInt && "expected constant loop bound");
+ assert((!isUnsignedLoop || apInt->first.getBitWidth() < 64) &&
+ "unsigned 64-bit loop bounds are not supported");
+ return isUnsignedLoop ? static_cast<int64_t>(apInt->first.getZExtValue())
+ : apInt->first.getSExtValue();
+ };
+ int64_t lbCst = getLoopBound(forOp.getLowerBound());
+ int64_t ubCst = getLoopBound(forOp.getUpperBound());
+ int64_t stepCst = getLoopBound(forOp.getStep());
if (unrollFactor == 1) {
if (constTripCount->isOne() &&
failed(forOp.promoteIfSingleIteration(rewriter)))
@@ -412,9 +427,13 @@ FailureOr<UnrolledLoopInfo> mlir::loopUnrollByFactor(
else
upperBoundUnrolled = forOp.getUpperBound();
- // Create constant for 'stepUnrolled'.
+ // Create constant for 'stepUnrolled'. When the main loop has zero
+ // iterations (tripCountEvenMultiple == 0), keep the original step.
+ // Computing stepCst * unrollFactor could overflow a narrow integer type and
+ // wrap to zero; a zero step causes constantTripCount to return nullopt
+ // instead of 0, which prevents the zero-trip main loop from being elided.
stepUnrolled =
- stepCst == stepUnrolledCst
+ (tripCountEvenMultiple == 0 || stepCst == stepUnrolledCst)
? step
: arith::ConstantOp::create(boundsBuilder, loc,
boundsBuilder.getIntegerAttr(
diff --git a/mlir/test/Dialect/SCF/loop-unroll.mlir b/mlir/test/Dialect/SCF/loop-unroll.mlir
index 4c72d9e99d049..248bd62414955 100644
--- a/mlir/test/Dialect/SCF/loop-unroll.mlir
+++ b/mlir/test/Dialect/SCF/loop-unroll.mlir
@@ -518,3 +518,115 @@ func.func @loop_unroll_static_yield_value_defined_above(%init: i32) {
// UNROLL-OUTER-BY-2: %[[SUM1:.*]] = arith.andi %[[INIT]], %[[SUM]]
// UNROLL-OUTER-BY-2: scf.yield %[[SUM1]], %[[INIT]] : i32, i32
+// -----
+
+// Test unrolling an unsigned scf.for whose bounds use narrow integer types.
+// The unsigned upper bound 2 in i2 has the same bit pattern as signed -2, so
+// getConstantIntValue (which sign-extends) would return -2. The fix ensures we
+// zero-extend when the loop comparison is unsigned.
+
+// Case 1: trip count 2 (i2 unsigned: lb=0, ub=2, step=1), unroll by 2 =>
+// fully unrolled, no residual loop.
+func.func @unroll_unsigned_i2_tc2() -> (i32, i32) {
+ %0 = arith.constant 7 : i32
+ %lb = arith.constant 0 : i2
+ %ub = arith.constant 2 : i2
+ %step = arith.constant 1 : i2
+ %result:2 = scf.for unsigned %i = %lb to %ub step %step
+ iter_args(%arg0 = %0, %arg1 = %0) -> (i32, i32) : i2 {
+ %add = arith.addi %arg0, %arg1 : i32
+ %mul = arith.muli %arg0, %arg1 : i32
+ scf.yield %add, %mul : i32, i32
+ }
+ return %result#0, %result#1 : i32, i32
+}
+// UNROLL-BY-2-LABEL: @unroll_unsigned_i2_tc2
+// UNROLL-BY-2-NOT: scf.for
+// UNROLL-BY-2: arith.addi
+// UNROLL-BY-2: arith.muli
+// UNROLL-BY-2: arith.addi
+// UNROLL-BY-2: arith.muli
+// UNROLL-BY-2: return
+
+// -----
+
+// Case 2: trip count 4 (i3 unsigned: lb=0, ub=4, step=1), unroll by 2 =>
+// main loop with step 2 and 2 copies of body, no epilogue.
+func.func @unroll_unsigned_i3_tc4() -> (i32, i32) {
+ %0 = arith.constant 7 : i32
+ %lb = arith.constant 0 : i3
+ %ub = arith.constant 4 : i3
+ %step = arith.constant 1 : i3
+ %result:2 = scf.for unsigned %i = %lb to %ub step %step
+ iter_args(%arg0 = %0, %arg1 = %0) -> (i32, i32) : i3 {
+ %add = arith.addi %arg0, %arg1 : i32
+ %mul = arith.muli %arg0, %arg1 : i32
+ scf.yield %add, %mul : i32, i32
+ }
+ return %result#0, %result#1 : i32, i32
+}
+// UNROLL-BY-2-LABEL: @unroll_unsigned_i3_tc4
+// UNROLL-BY-2: scf.for unsigned %{{.*}} = %{{.*}} to %{{.*}} step %c2_i3
+// UNROLL-BY-2: arith.addi
+// UNROLL-BY-2: arith.muli
+// UNROLL-BY-2: arith.addi
+// UNROLL-BY-2: arith.muli
+// UNROLL-BY-2-NOT: scf.for
+// UNROLL-BY-2: return
+
+// -----
+
+// Case 3: trip count 1 (i1 unsigned: lb=0, ub=1, step=1), unroll by 2 =>
+// trip count not divisible by 2, so epilogue handles the single iteration;
+// no residual loop.
+func.func @unroll_unsigned_i1_tc1() -> (i32, i32) {
+ %0 = arith.constant 7 : i32
+ %lb = arith.constant 0 : i1
+ %ub = arith.constant 1 : i1
+ %step = arith.constant 1 : i1
+ %result:2 = scf.for unsigned %i = %lb to %ub step %step
+ iter_args(%arg0 = %0, %arg1 = %0) -> (i32, i32) : i1 {
+ %add = arith.addi %arg0, %arg1 : i32
+ %mul = arith.muli %arg0, %arg1 : i32
+ scf.yield %add, %mul : i32, i32
+ }
+ return %result#0, %result#1 : i32, i32
+}
+// UNROLL-BY-2-LABEL: @unroll_unsigned_i1_tc1
+// UNROLL-BY-2-NOT: scf.for
+// UNROLL-BY-2: arith.addi
+// UNROLL-BY-2: arith.muli
+// UNROLL-BY-2: return
+
+// -----
+
+// Case 4: trip count 5 (i3 unsigned: lb=0, ub=5, step=1), unroll by 2 =>
+// main loop runs twice with step 2 (4 of 5 iterations), epilogue runs once.
+// In i3, ub=5 has the same bit pattern as signed -3; with the old sign-extended
+// ubCst=-3 the comparison upperBoundUnrolledCst(4) < ubCst(-3) would be false,
+// suppressing the epilogue. This tests Bug 1 independently of Bug 2.
+func.func @unroll_unsigned_i3_tc5_with_epilogue() -> (i32, i32) {
+ %0 = arith.constant 7 : i32
+ %lb = arith.constant 0 : i3
+ %ub = arith.constant 5 : i3
+ %step = arith.constant 1 : i3
+ %result:2 = scf.for unsigned %i = %lb to %ub step %step
+ iter_args(%arg0 = %0, %arg1 = %0) -> (i32, i32) : i3 {
+ %add = arith.addi %arg0, %arg1 : i32
+ %mul = arith.muli %arg0, %arg1 : i32
+ scf.yield %add, %mul : i32, i32
+ }
+ return %result#0, %result#1 : i32, i32
+}
+// UNROLL-BY-2-LABEL: @unroll_unsigned_i3_tc5_with_epilogue
+// Main loop (4 of 5 iterations unrolled, step=2, body appears twice).
+// UNROLL-BY-2: scf.for unsigned %{{.*}} = %{{.*}} to %{{.*}} step %c2_i3
+// UNROLL-BY-2: arith.addi
+// UNROLL-BY-2: arith.muli
+// UNROLL-BY-2: arith.addi
+// UNROLL-BY-2: arith.muli
+// Epilogue (1 remaining iteration, promoted out of loop).
+// UNROLL-BY-2-NOT: scf.for
+// UNROLL-BY-2: arith.addi
+// UNROLL-BY-2: arith.muli
+// UNROLL-BY-2: return
``````````
</details>
https://github.com/llvm/llvm-project/pull/189001
More information about the Mlir-commits
mailing list