[flang-commits] [flang] [flang][OpenMP] Only HLFIR base in privatization logic (PR #84123)

Fri Mar 8 08:47:07 PST 2024

jeanPerier wrote:

Your solution looks acceptable to me.

In the future, the clone creation could be based on HLFIR helpers (like hlfir::createTempFromMold) using #0 instead of going to the fir::ExtendedValue level and having custom code in the bridge. But these helpers may not do exactly what is needed for OpenMP currently (e.g, they will use allocmem for arrays, while the clone code creates allocas).

> Also, I am happy to help in general with cleaning up references to `#1` in general (not just privatization). Let me know if you are interested, so that we can align on this somehow.

In the case where `#0` and `#1` are both fir.box/fir.class, this could make sense given the commit that anyway "merged" them in HLFIR codegeneration. However, when `#1` is not a fir.box, replacing it by `#0` would most likely pessimize the generated code because FIR codegen for operations using fir.box is usually much more expensive/harder to optimize (by default, FIR codegen will assumed addressing of fir.box is not contiguous), so this would pessimize codes like this when going from HLFIR down to FIR (and later LLVM):

```
subroutine foo(x, y)
  real :: x(0:99), y(0:99)
  x = y
end subroutine
```

from the following FIR (-emit-fir of the loop body):
```
    fir.do_loop %arg2 = %c1 to %c100 step %c1 unordered {
      %5 = arith.addi %arg2, %c-1 : index
      %6 = fir.array_coor %3(%0) %5 : (!fir.ref<!fir.array<100xf32>>, !fir.shapeshift<1>, index) -> !fir.ref<f32>
      %7 = fir.load %6 : !fir.ref<f32>
      %8 = arith.addi %arg2, %c-1 : index
      %9 = fir.array_coor %1(%0) %8 : (!fir.ref<!fir.array<100xf32>>, !fir.shapeshift<1>, index) -> !fir.ref<f32>
      fir.store %7 to %9 : !fir.ref<f32>
    }
```
to

```
    fir.do_loop %arg2 = %c1 to %c100 step %c1 unordered {
      %5 = arith.addi %arg2, %c-1 : index
      %6 = fir.array_coor %4(%0) %5 : (!fir.box<!fir.array<100xf32>>, !fir.shapeshift<1>, index) -> !fir.ref<f32>
      %7 = fir.load %6 : !fir.ref<f32>
      %8 = arith.addi %arg2, %c-1 : index
      %9 = fir.array_coor %2(%0) %8 : (!fir.box<!fir.array<100xf32>>, !fir.shapeshift<1>, index) -> !fir.ref<f32>
      fir.store %7 to %9 : !fir.ref<f32>
    }
```

Because `hlfir::translateToExtendedValue` is used in the [HLFIR to FIR pass](https://github.com/llvm/llvm-project/blame/c54e0524eeffcf2c5428cd2bc0449a1989e82cd8/flang/lib/Optimizer/HLFIR/Transforms/ConvertToFIR.cpp#L78) to get rid of fir.box usages as much as possible, and use simple FIR operations as much as possible.

To be fair, LLVM with optimizations enabled does a good job at removing the temporary descriptors and replacing the byte stride addressing in the simple cases where it can see the descriptor creation. But I would not jump on changing this without a great confidence that increasing the descriptor usages in FIR operations (as opposed to HLFIR) won't be a problem for more complex optimizations and programs.

https://github.com/llvm/llvm-project/pull/84123