[clang] [flang] [llvm] [openmp] [Clang][OpenMP][LoopTransformations] Add support for "#pragma omp fuse" loop transformation directive and "looprange" clause (PR #139293)

Tue Jul 15 09:16:04 PDT 2025

rofirrim wrote:

I'm a bit uncertain with what we want to do with `NumGeneratedLoopNests` and `NumGeneratedLoops`.

I understand that, outside of dependent contexts, this is some sort of synthesised attribute (in the base case from analysing the loop nests / canonical loop sequences) that can be used by an enclosing loop transformation to check it is still valid.

I wonder if an alternative approach is using a list of integers, one per loop representing the depth of the canonical loop contained in there. In lack of a better name, let's call this the GeneratedLoopSequence (`gls` in the examples, read the examples bottom-up)

```cpp
// after unroll gls = [], because it is not partial and there may not be loop anymore
#pragma omp unroll 
// after fuse gls = [ 1 ]
#pragma omp fuse
// from syntax gls = [ 1, 1 ]
{
   for (...) { }
   for (...) { }
}
```

```cpp
// after fuse gls = [ 6, 1 ]
#pragma omp fuse looprange(2, 2)
// from syntax gls = [ 6, 1, 1 ]
{
   // after tile gls = [ 6 ]
   #pragma omp tile sizes(x, y, z)
   // from syntax gls = [ 3 ]
   for (...) {  
      for (...) { 
        for (...) { 
        } 
      }
   }
   // from syntax gls = [ 1 ]
   for (...) { }
   // from syntax gls = [ 1 ]
   for (...) { }
}
```

```cpp
// after split gls = [ 1, 1]
#pragma omp split counts(a, b)
// from syntax, gls = [ 1 ] 
for (...) { }
```

(For dependent contexts I was thinking on making the GeneratedLoopSequence an `std::optional`, so it is explicitly absent and can be told apart from `[]`)

But I wonder if this approach is enough. I was considering the `apply` clause, when we get to implement it.  And maybe a list of integers is not enough?

```cpp
// after apply(unroll) gls = []
// after split gls = [ 1, 1 ]
#pragma omp split counts(a, b) apply(unroll)
// from syntax, gls = [ 1 ] 
for (...) { }
```

```cpp
// after apply(unroll(2)), non-partial unroll the second loop, gls = [1, ?not a loop anymore? ] 
// after split gls = [ 1, 1 ]
#pragma omp split counts(a, b) apply(unroll(2))
// from syntax, gls = [ 1 ] 
for (...) { }
```

```cpp
// after apply(split(2) counts(c, d)), gls = [1, [1, 1] ] (?)
// after split gls = [ 1, 1 ]
#pragma omp split counts(a, b) apply(split(2) counts(c, d))
// from syntax, gls = [ 1 ] 
for (...) { }
```

```cpp
// after apply(split counts(c, d)), gls = [[1, 1], [1, 1]] (???)
// after split gls = [ 1, 1 ]
#pragma omp split counts(a, b) apply(split counts(c, d))
// from syntax, gls = [ 1 ] 
for (...) { }
```
Maybe there is no need to recursively represent all the nested transformation?

Other examples, from OpenMP, seem OK:

```cpp
void span_apply(double A[128][128])
{
  // this is not a loop transformation but this is fine because gls is a singleton
  // and collapse is 2 ≤ 4
  #pragma omp for collapse(2)
  // from apply(grid: reverse, interchange) (this affects the first two loops) gls = [ 4 ] 
  // from tile gls = [ 4 ]
  #pragma omp tile sizes(16,16) apply(grid: interchange,reverse)
  // from syntax gls = [ 2 ]
  for (int i = 0; i < 128; ++i)
    for (int j = 0; j < 128; ++j)
       A[i][j] = A[i][j] + 1;
}
```

```cpp
void nested_apply(double A[100])
{
  // after apply(reverse), gls = [ 2 ]
  // after applyt(intratile: unroll partial(2)), gls = [ 2 ]
  // after tile: gls = [ 2 ]
  #pragma omp tile sizes(10) apply(intratile: unroll partial(2) apply(reverse))
  // from syntax, gls = [ 1 ]
  for (int i = 0; i < 100; ++i)
     A[i] = A[i] + 1;
}
```

Thoughts?

https://github.com/llvm/llvm-project/pull/139293