[Mlir-commits] [clang] [llvm] [mlir] [openmp] [LoopTiling][Clang][MLIR] Canonical Intra-tile Loops (PR #191114)

Thu Apr 23 05:17:46 PDT 2026

amitamd7 wrote:

> This PR applies the PredicatedNeeded independent of whter the intratile loop actually needs to be a canonical loop to make e.g. work:
> 
> ```
> #pragma omp for collapse(2)
> #pragma omp tile sizes(2)
> for (int i = 0; i < n; ++i)
> ```
> 
> but it is not for e.g.
> 
> ```
> #pragma omp for collapse(1)
> #pragma omp tile sizes(2)
> for (int i = 0; i < n; ++i)
> ```
> 
> The `min(.floor.iv + DimTileSize, NumIterations)` solution would generally be preferable if the loop does not need to be canonical.
> 
> Could you have a look whether
> 
> 1. LLVM optimizes both to the same code anyway. Look at the LoopBoundSplit pass, but it should not even be necessary and ScalarEvolution being able to derive the BackedgeTakenCount as of we did it explicitly with the min-expression. Possibly some nsw flags help as well.
> 2. you can derive whether the loop needs to be canonical. This could be a crude heuristic, such as whether there is any other directive applied to the `getTransformed()`, regardless how how deep it does.

That could be a useful optimization. 
The IR differs as:
With min condition:
- shrinks the partial tiles to lesser iterations with `smin.i32(i32 %8, i32 %0)` // min(floor+DimTileSize, N)

With the predicate guard case:
- with a fixed bound (4 here): `icmp eq i32 %15, 4`

This is confirmed in SCEV o/p too. So, this points that LLVM doesn't optimizes both to the same IR hence,
we may need to apply the checks that elides the predicate guard (restoring min condition) in case canonicalization is not required. But one concern, aren't we over-complicating this by saving execution of remainder iterations in the last tile? 

https://github.com/llvm/llvm-project/pull/191114