[Openmp-commits] [clang] [flang] [llvm] [openmp] [Clang][OpenMP][LoopTransformations] Add support for "#pragma omp fuse" loop transformation directive and "looprange" clause (PR #139293)
    Michael Kruse via Openmp-commits 
    openmp-commits at lists.llvm.org
       
    Mon Jul 28 05:39:06 PDT 2025
    
    
  
Meinersbur wrote:
> Maybe there is no need to recursively represent all the nested transformation?
I don't see a need to represent the entire loop structure. The only info that I think is potentially needed is the nest depth for transformations that generate a loop nest (e.g. tile), and for transformations that generate loop sequences it is the number of loops in that sequence (e.g. `split` for index-set-splitting, and by construction also every construct with a `looprange` clause).
The only use case for 2130117f92e51df73ac8c4b7e37f7f89178a89f2 was to identify whether -- while being in a non-fully instantiated template context -- there are additional generated loops or not. Within a dependend context `getTransformedStmt()` may return null because e.g. arguments of the `looprange()` clause might be template parameters. 
```c++
      Stmt *TransformedStmt = Dir->getTransformedStmt();
      if (!TransformedStmt) {
        unsigned NumGeneratedLoops = Dir->getNumGeneratedLoops();
        if (NumGeneratedLoops == 0) {
          // May happen if the loop transformation does not result in a
          // generated loop (such as full unrolling).
          break;
        }
        if (NumGeneratedLoops > 0) {
          // The loop transformation construct has generated loops, but these
          // may not have been generated yet due to being in a dependent
          // context.
          return true;
        }
      }
```
Outside of a template context `getTransformedStmt()` returns the fully expanded loops which contain the nesting information that can be analyzed by recursing into it, so no separate generated loop info should be necessary. Eventually the template will be used in a instantiated context (or it's completly unused) where it can be analyzed. Until then, all analysis can be defererred (which ` return true;` does). 
Since it is typically easily obtainable (e.g. by the number of items in a `sizes` clause for tile), the number of generated loops can be used for additional semantic checking even in a templated (/dependent) context. This is what the `break` above does: Since we know there are no further loops even in an instantiated context, we do not need to defer semantic checks until later, and analyze the loop nesting we already have.
I imagine those numbers could be used for better diagnostic output in general, e.g. when using `collapse(2)` on a `#pragma omp unroll partial`, hinting the user that the transformation does not produce a sufficient number of loops, instead of pointing out that the unrolled loop body is not a valid canonical loop[^1]. However, I think tracking the entire loop nest for better diagnostic messages is overblown, and may not even be accurate. For instance:
```c
for (int i = 1; i < 20; i+=1)
  #pragma omp tile sizes(i)
  for (int j = 1; j < n; j+=1)
```
Just recording the loop nesting of this may result in `gls = [ 3 ]`. Ovever, the tile size depends on the outer loop which the OpenMP restrictions on canonical loop nests do not allow. One may track the dependencies through all the clauses and loop bounds, but I think it is sufficient to let `checkOpenMPLoop` diagnose all those restrictions once `getTransformedStmt()` is guarenteed to be available.
[^1]: In the OpenMP spec the unrolled body is never a valid canonical loop, even with `partial(1)`, nor a loop sequence, a concept that was added after unroll. I am not sure what the use case would be for considering the unrolled loop body to be a canonical loop sequence.
https://github.com/llvm/llvm-project/pull/139293
    
    
More information about the Openmp-commits
mailing list