[Mlir-commits] [llvm] [mlir] [MLIR][OpenMP] Lowering nontemporal clause to LLVM IR for SIMD directive (PR #118751)

Wed Mar 12 07:13:39 PDT 2025

kiranchandramohan wrote:

> I'm not sure that a callback to MLIR is fully addressing Kiran's concern, because the MLIR dialect is also shared with things that are not just Fortran. This is a difficult problem to solve due to the layering of different parts of the compiler and I can't see a nice way to do this.
> 
> @kiranchandramohan which do you think is the least bad option of the two you suggested previously?
> 
> 1. Continuing with special handling of boxes (and the regular memcpys between the boxes) in OpenMPToLLVMIRTranslation
> 2. Your suggestion to do this in the conversion from FIR to LLVM dialect

Thanks @tblah for chiming in here.

Thinking about this again, the loads and stores on the real data are always visible in HLFIR/FIR code. I assume the current problem is that the MLIR value captured in the temporal clause is for the descriptor (or its address) and not for the address of the real data. If we have a pass in Flang that moves the temporal clause values from the descriptor to the real address of the data, then we need not do reverse engineering in the translation to find the correct loads and stores.

For the following example, it will be moving from `nontemporal(%3)` to `nontemporal(%18)`

```
    omp.simd nontemporal(%3 : !fir.ref<!fir.box<!fir.heap<!fir.array<?xi32>>>>) private(@_QFsbEi_private_i32 %2 -> %arg2 : !fir.ref<i32>) {
      omp.loop_nest (%arg3) : i32 = (%c1_i32) to (%c100_i32) inclusive step (%c1_i32) {
        %11 = fir.declare %arg2 {uniq_name = "_QFsbEi"} : (!fir.ref<i32>) -> !fir.ref<i32>
        fir.store %arg3 to %11 : !fir.ref<i32>
        %12 = fir.load %3 : !fir.ref<!fir.box<!fir.heap<!fir.array<?xi32>>>>
        %13 = fir.load %11 : !fir.ref<i32>
        %14 = fir.convert %13 : (i32) -> i64
        %15 = fir.box_addr %12 : (!fir.box<!fir.heap<!fir.array<?xi32>>>) -> !fir.heap<!fir.array<?xi32>>
        %16:3 = fir.box_dims %12, %c0 : (!fir.box<!fir.heap<!fir.array<?xi32>>>, index) -> (index, index, index)
        %17 = fir.shape_shift %16#0, %16#1 : (index, index) -> !fir.shapeshift<1>
        %18 = fir.array_coor %15(%17) %14 : (!fir.heap<!fir.array<?xi32>>, !fir.shapeshift<1>, i64) -> !fir.ref<i32>
        %19 = fir.load %18 : !fir.ref<i32>
        %20 = fir.load %4 : !fir.ref<i32>
        %21 = arith.addi %19, %20 : i32
        fir.store %21 to %18 : !fir.ref<i32>
        omp.yield
      }
    }
```    

Original Fortran.
```
subroutine sb(x, y)
  integer, allocatable :: x(:)
  integer :: y
  allocate(x(100))
  !$omp simd nontemporal(x)
  do i=1,100
  x(i) = x(i) + y
  end do
  !$omp end simd
end subroutine
```

Does such a pass make sense?

https://github.com/llvm/llvm-project/pull/118751