[Mlir-commits] [mlir] [mlir][affine] Use value bound inference to determine minimum/maximum trip counts in loop analysis (PR #128113)

Wed Apr 9 06:22:35 PDT 2025

linuxlonelyeagle wrote:

@krzysz00 I think you can still give ideas for improvement because you know the GPU very well.Based on comments left by @ftynse .There are no resolve comments above. I think there are questions with consistent answers, so I'll make a unified answer here.

> I don't understand the need for min/max bound logic here.
> it's unclear to me why we would need to reason about the upper bound.
maxTripCount is always greater than or equal to TriCount. In the case of the CPU, they are equal.The original commit included the removal of the invalid loop, which has now been removed.
```
std::optional<uint64_t> tripCount = getConstantTripCount(forOp);
std::optional<uint64_t> maxTripCount = getMaxConstantTripCount(forOp);
```
* Keep the loop in this case.
tripCount = 0
maxTripCount = 1
Keep the loop in this case.

* And the other is the case of dumping the IR in the loop out of the loop.
tripCount = 1
maxTripCount = 1

* General case
Dump the IR in the loop and then keep the loop once, which is equivalent to an if statement for controlling boundaries, since only some threads will execute the code in the loop.
tripCount > 1

* core idea
tripCount = (upper - (blockSize - 1)) div stride
maxTripCount = (uppper - 0) div stride
(blockSize - 1 ) = maxThreadId = blockSIze - 1
0 = minThreadId = 0
The above rules apply to all scoped Values.

* The role of min bound
**min bound is used to determine the size of the unroll factor.**

* The role of max bound
**max bound is used to determine whether to keep the last loop (which is equivalent to an if statement).**

> whether the value is known to always be equal
The following cases are equivalent.
```
// thread size = 2, min thread id = 0, max thread id = 1
%thread_id = gpu.thread_id x
affine.for %iv = %thread_id to 2 step 2 {
  // use %iv
}
// max trip = (2 - 0) / 2 = 1
// min trip = (2 -1 ) / 2 = 1 
```
> Can you provide a mathematical justification as to why this provides a correct (and tight?) upper bound?
I may not be able to provide a mathematical formula.But for max trip, it is true that `std::min` should not be used, `std::max` should be used.

Suppose there is a loop here, but affineMap has two results. They are in the ranges [4, 6] and [2, 3], and before they should have resulted in [2, 3], which is clearly not true, and now the result is [2, 6]. It shows that the unroll trip should be 2, and 6 shows the need to keep a loop that controls the boundary.
> Could we avoid using GPU dialect operations here? 
I still think I should keep the current test as it encompasses my usage, and the goal of this PR in the beginning was designed for GPUs as well.

Further discussion is welcome.

https://github.com/llvm/llvm-project/pull/128113