[Mlir-commits] [llvm] [mlir] [OpenMP][MLIR][OMPIRBuilder] Add a small optional constant alloca raise function pass to finalize, utilised in convertTarget (PR #78818)

Mon Feb 19 20:50:26 PST 2024

================
@@ -0,0 +1,43 @@
+// RUN: mlir-translate -mlir-to-llvmir %s | FileCheck %s
+
+// A small condensed version of a problem requiring constant alloca raising in
+// Target Region Entries for user injected code, found in an issue in the Flang
+// compiler. Certain LLVM IR optimisation passes will perform runtime breaking 
+// transformations on allocations not found to be in the entry block, current
+// OpenMP dialect lowering of TargetOp's will inject user allocations after
+// compiler generated entry code, in a seperate block, this test checks that
+// a small function which attempts to raise some of these (specifically 
+// constant sized) allocations performs its task reasonably in these 
+// scenarios. 
+
+module attributes {omp.is_target_device = true} {
+  llvm.func @_QQmain() attributes {omp.declare_target = #omp.declaretarget<device_type = (host), capture_clause = (to)>} {
+    %1 = llvm.mlir.constant(1 : i64) : i64
+    %2 = llvm.alloca %1 x !llvm.struct<(ptr)> : (i64) -> !llvm.ptr
+    %3 = omp.map_info var_ptr(%2 : !llvm.ptr, !llvm.struct<(ptr)>) map_clauses(tofrom) capture(ByRef) -> !llvm.ptr
+    omp.target map_entries(%3 -> %arg0 : !llvm.ptr) {
+    ^bb0(%arg0: !llvm.ptr):
+      %4 = llvm.mlir.constant(1 : i32) : i32
+      %5 = llvm.alloca %4 x !llvm.struct<(ptr)> {alignment = 8 : i64} : (i32) -> !llvm.ptr
+      %6 = llvm.mlir.constant(50 : i32) : i32
+      %7 = llvm.mlir.constant(1 : i64) : i64
+      %8 = llvm.alloca %7 x i32 : (i64) -> !llvm.ptr
+      llvm.store %6, %8 : i32, !llvm.ptr
+      %9 = llvm.mlir.undef : !llvm.struct<(ptr)>
+      %10 = llvm.insertvalue %8, %9[0] : !llvm.struct<(ptr)> 
+      llvm.store %10, %5 : !llvm.struct<(ptr)>, !llvm.ptr
+      %88 = llvm.call @_ExternalCall(%arg0, %5) : (!llvm.ptr, !llvm.ptr) -> !llvm.struct<()>
----------------
agozillon wrote:

I made the following small little test case using blocks, it makes some attempt to generate the same behavior as the C++ test case above, with some extra on top:

```
PROGRAM main
    integer, allocatable :: a
    allocate(a)
    a = 5
 
!$omp target map(tofrom:a)
   sub_call : BLOCK
      INTEGER :: b
      b = a + 2
      CALL Sub(b)

      do i = 1, 10
        BLOCK
           integer :: j 
           j = 10
           j = j + 10
           a = a + j 
        END BLOCK
        a = a + b
     end do
    END BLOCK sub_call
!$omp end target

print *, a
END PROGRAM main
  
SUBROUTINE Sub(B)
  INTEGER :: b
  b = b * b
END SUBROUTINE Sub
```

It yields the same answer for both target and regular Fortran host (as does moving j outside and into the first block yield identical results), what is interesting is that forcing undefined behavior via removing the initial assignment to `j`, the regular Fortran host code will print a garbage value, whereas the target will return an actual consistent value, effectively not emitting the addition of `j` in the loop to `a`. This happens with or without the alloca raise pass here, so seems a seperate inconsistency.

I did, however, have to use the deprecated fir flow via `-flang-deprecated-no-hlfir` (which still uses the alloca raise, so it would still produce the relevant inconsistencies), as this little test has unfortunately shown another very similar compiler crash for allocatables to the one that this PR fixes (crashes with or without this PR sadly). So that will be fun to look into, I am unsure if it's a related or entirely new issue at this point though (will look into it more in the near future). 


https://github.com/llvm/llvm-project/pull/78818