[PATCH] D142914: [MLIR][OpenMP] Added OMPIRBuilder support for Target Data directives.

Mon Mar 13 15:15:33 PDT 2023

kiranchandramohan added inline comments.

================
Comment at: mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp:1357
+/// Process MapOperands for Target Data directives.
+static LogicalResult processMapOperand(
+    llvm::IRBuilderBase &builder, LLVM::ModuleTranslation &moduleTranslation,
----------------
TIFitis wrote:
> TIFitis wrote:
> > TIFitis wrote:
> > > kiranchandramohan wrote:
> > > > kiranchandramohan wrote:
> > > > > TIFitis wrote:
> > > > > > TIFitis wrote:
> > > > > > > kiranchandramohan wrote:
> > > > > > > > TIFitis wrote:
> > > > > > > > > kiranchandramohan wrote:
> > > > > > > > > > TIFitis wrote:
> > > > > > > > > > > kiranchandramohan wrote:
> > > > > > > > > > > > Isn't it possible to sink this whole function into the OpenMPIRBuilder by passing it a list of `mapOpValue` and `mapTypeFlags`?
> > > > > > > > > > > > `lvm::Value *mapOpValue = moduleTranslation.lookupValue(mapOp);`
> > > > > > > > > > > > 
> > > > > > > > > > > > Did i miss something? Or is this in anticipation of more processing required for other types?
> > > > > > > > > > > I'm not fully sure but we might need more MLIR related things when supporting types other than LLVMPointerType. Also there is a call to mlir::LLVM::createMappingInformation.
> > > > > > > > > > > 
> > > > > > > > > > > I guess it might still be possible to move most of it to the IRBuilder, would you like me to do that?
> > > > > > > > > > Callbacks are useful when there is frontend-specific handling that is required. If more types require to be handled then it is better to have the callback. We can revisit this after all types are handled. I assume, the current handling is for scalars and arrays of known-size.
> > > > > > > > > I am a novice at FORTRAN so I'm not aware of all  the types and scenarios.
> > > > > > > > > 
> > > > > > > > > I've tested the following cases and they work end-to-end:
> > > > > > > > > 
> > > > > > > > > **Fortran:**
> > > > > > > > > ```
> > > > > > > > > subroutine openmp_target_data_region(a)
> > > > > > > > >     real :: a(*)
> > > > > > > > >     integer :: b(1024)
> > > > > > > > >     character :: c
> > > > > > > > >     integer, pointer :: p
> > > > > > > > >     !$omp target enter data map(to: a, b, c, p)
> > > > > > > > > end subroutine openmp_target_data_region
> > > > > > > > > ```
> > > > > > > > > 
> > > > > > > > > **LLVM IR(** `flang-new -fc1 -emit-llvm -fopenmp test.f90 -o test.ll`** ):**
> > > > > > > > > 
> > > > > > > > > ```
> > > > > > > > > ; ModuleID = 'FIRModule'
> > > > > > > > > source_filename = "FIRModule"
> > > > > > > > > target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
> > > > > > > > > target triple = "x86_64-unknown-linux-gnu"
> > > > > > > > > 
> > > > > > > > > %struct.ident_t = type { i32, i32, i32, i32, ptr }
> > > > > > > > > 
> > > > > > > > > @0 = private unnamed_addr constant [13 x i8] c"loc(unknown)\00", align 1
> > > > > > > > > @1 = private unnamed_addr constant [56 x i8] c";/home/akash/Documents/scratch/test2.f90;unknown;3;16;;\00", align 1
> > > > > > > > > @2 = private unnamed_addr constant [56 x i8] c";/home/akash/Documents/scratch/test2.f90;unknown;4;18;;\00", align 1
> > > > > > > > > @3 = private unnamed_addr constant [56 x i8] c";/home/akash/Documents/scratch/test2.f90;unknown;5;25;;\00", align 1
> > > > > > > > > @4 = private unnamed_addr constant [23 x i8] c";unknown;unknown;0;0;;\00", align 1
> > > > > > > > > @5 = private unnamed_addr constant %struct.ident_t { i32 0, i32 2, i32 0, i32 22, ptr @4 }, align 8
> > > > > > > > > @.offload_maptypes = private unnamed_addr constant [4 x i64] [i64 1, i64 1, i64 1, i64 1]
> > > > > > > > > @.offload_mapnames = private constant [4 x ptr] [ptr @0, ptr @1, ptr @2, ptr @3]
> > > > > > > > > 
> > > > > > > > > declare ptr @malloc(i64)
> > > > > > > > > 
> > > > > > > > > declare void @free(ptr)
> > > > > > > > > 
> > > > > > > > > define void @openmp_target_data_region_(ptr %0) {
> > > > > > > > >   %2 = alloca [4 x ptr], align 8
> > > > > > > > >   %3 = alloca [4 x ptr], align 8
> > > > > > > > >   %4 = alloca [4 x i64], align 8
> > > > > > > > >   %5 = alloca [1024 x i32], i64 1, align 4
> > > > > > > > >   %6 = alloca [1 x i8], i64 1, align 1
> > > > > > > > >   %7 = alloca { ptr, i64, i32, i8, i8, i8, i8 }, i64 1, align 8
> > > > > > > > >   %8 = alloca ptr, i64 1, align 8
> > > > > > > > >   store ptr null, ptr %8, align 8
> > > > > > > > >   br label %entry
> > > > > > > > > 
> > > > > > > > > entry:                                            ; preds = %1
> > > > > > > > >   %9 = getelementptr inbounds [4 x ptr], ptr %2, i32 0, i32 0
> > > > > > > > >   store ptr %0, ptr %9, align 8
> > > > > > > > >   %10 = getelementptr inbounds [4 x ptr], ptr %3, i32 0, i32 0
> > > > > > > > >   store ptr %0, ptr %10, align 8
> > > > > > > > >   %11 = getelementptr inbounds [4 x i64], ptr %4, i32 0, i32 0
> > > > > > > > >   store i64 ptrtoint (ptr getelementptr (ptr, ptr null, i32 1) to i64), ptr %11, align 8
> > > > > > > > >   %12 = getelementptr inbounds [4 x ptr], ptr %2, i32 0, i32 1
> > > > > > > > >   store ptr %5, ptr %12, align 8
> > > > > > > > >   %13 = getelementptr inbounds [4 x ptr], ptr %3, i32 0, i32 1
> > > > > > > > >   store ptr %5, ptr %13, align 8
> > > > > > > > >   %14 = getelementptr inbounds [4 x i64], ptr %4, i32 0, i32 1
> > > > > > > > >   store i64 ptrtoint (ptr getelementptr (ptr, ptr null, i32 1) to i64), ptr %14, align 8
> > > > > > > > >   %15 = getelementptr inbounds [4 x ptr], ptr %2, i32 0, i32 2
> > > > > > > > >   store ptr %6, ptr %15, align 8
> > > > > > > > >   %16 = getelementptr inbounds [4 x ptr], ptr %3, i32 0, i32 2
> > > > > > > > >   store ptr %6, ptr %16, align 8
> > > > > > > > >   %17 = getelementptr inbounds [4 x i64], ptr %4, i32 0, i32 2
> > > > > > > > >   store i64 ptrtoint (ptr getelementptr (ptr, ptr null, i32 1) to i64), ptr %17, align 8
> > > > > > > > >   %18 = getelementptr inbounds [4 x ptr], ptr %2, i32 0, i32 3
> > > > > > > > >   store ptr %7, ptr %18, align 8
> > > > > > > > >   %19 = getelementptr inbounds [4 x ptr], ptr %3, i32 0, i32 3
> > > > > > > > >   store ptr %7, ptr %19, align 8
> > > > > > > > >   %20 = getelementptr inbounds [4 x i64], ptr %4, i32 0, i32 3
> > > > > > > > >   store i64 ptrtoint (ptr getelementptr (ptr, ptr null, i32 1) to i64), ptr %20, align 8
> > > > > > > > >   %21 = getelementptr inbounds [4 x ptr], ptr %2, i32 0, i32 0
> > > > > > > > >   %22 = getelementptr inbounds [4 x ptr], ptr %3, i32 0, i32 0
> > > > > > > > >   %23 = getelementptr inbounds [4 x i64], ptr %4, i32 0, i32 0
> > > > > > > > >   call void @__tgt_target_data_begin_mapper(ptr @5, i64 -1, i32 4, ptr %21, ptr %22, ptr %23, ptr @.offload_maptypes, ptr @.offload_mapnames, ptr null)
> > > > > > > > >   ret void
> > > > > > > > > }
> > > > > > > > > 
> > > > > > > > > ; Function Attrs: nounwind
> > > > > > > > > declare void @__tgt_target_data_begin_mapper(ptr, i64, i32, ptr, ptr, ptr, ptr, ptr, ptr) #0
> > > > > > > > > 
> > > > > > > > > ; Function Attrs: nounwind
> > > > > > > > > declare void @__tgt_target_data_end_mapper(ptr, i64, i32, ptr, ptr, ptr, ptr, ptr, ptr) #0
> > > > > > > > > 
> > > > > > > > > attributes #0 = { nounwind }
> > > > > > > > > 
> > > > > > > > > !llvm.module.flags = !{!0}
> > > > > > > > > 
> > > > > > > > > !0 = !{i32 2, !"Debug Info Version", i32 3}
> > > > > > > > > ```
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > If I am missing some important types here then please let me know, I'll try to see if they work and if not I'll add support for them in further patches.
> > > > > > > > In general how are you passing the size of the fortran variable/type to the OpenMP runtime? For scalars  and arrays with sizes known at compile time, this comes from the type itself. But for other types like assumed-shape arrays, variable length arrays this information comes from the descriptor or from other fields. My question is how is this being collected and passed to the runtime?
> > > > > > > > 
> > > > > > > > For all the types, I see the following code in the IR you gave above for generating the `ArgSizes` argument of `__tgt_target_data_begin_mapper`. I don't understand how the code (and size) be the same for all the types.
> > > > > > > > ```
> > > > > > > > ...
> > > > > > > > %11 = getelementptr inbounds [4 x i64], ptr %4, i32 0, i32 0
> > > > > > > > store i64 ptrtoint (ptr getelementptr (ptr, ptr null, i32 1) to i64), ptr %11, align 8
> > > > > > > > ...
> > > > > > > > %14 = getelementptr inbounds [4 x i64], ptr %4, i32 0, i32 1
> > > > > > > > store i64 ptrtoint (ptr getelementptr (ptr, ptr null, i32 1) to i64), ptr %14, align 8
> > > > > > > > ...
> > > > > > > > %17 = getelementptr inbounds [4 x i64], ptr %4, i32 0, i32 2
> > > > > > > > store i64 ptrtoint (ptr getelementptr (ptr, ptr null, i32 1) to i64), ptr %17, align 8
> > > > > > > > ...
> > > > > > > > %20 = getelementptr inbounds [4 x i64], ptr %4, i32 0, i32 3
> > > > > > > > store i64 ptrtoint (ptr getelementptr (ptr, ptr null, i32 1) to i64), ptr %20, align 8
> > > > > > > > ...
> > > > > > > > %23 = getelementptr inbounds [4 x i64], ptr %4, i32 0, i32 0
> > > > > > > > call void @__tgt_target_data_begin_mapper(ptr @5, i64 -1, i32 4, ptr %21, ptr %22, ptr %23, ptr @.offload_maptypes, ptr @.offload_mapnames, ptr null)
> > > > > > > > ```
> > > > > > > > 
> > > > > > > > I would like some more clarity on this before proceeding. Clang generates different code for this and I see that it is appropriately filling the `ArgSizes` field.
> > > > > > > `OpenMPIRBuilder::getSizeInBytes` is the function responsible for calculating the `ArgSizes`.
> > > > > > > 
> > > > > > > For the Value : `%1 = alloca i64, i64 1, align 8` it returns size as `i64 ptrtoint (ptr getelementptr (ptr, ptr null, i32 1) to i64)` and TBH I don't understand this. This function was taken from OpenACC.
> > > > > > > 
> > > > > > > I will re-implement this function and update the patch.
> > > > > > Actually, the generated code seems correct. The first answer [[ https://stackoverflow.com/questions/14608250/how-can-i-find-the-size-of-a-type | here ]] gives insight into how `OpenMPIRBuilder::getSizeInBytes` is calculating the size of the type.
> > > > > > 
> > > > > > Opaque pointers make it look the same for all the different types, disabling opaque pointers you get something like the following:
> > > > > > 
> > > > > > `integer(8) :: a` :
> > > > > > ```
> > > > > >    %7 = getelementptr inbounds [1 x i64], [1 x i64]* %.offload_sizes, i32 0, i32 0
> > > > > >    store i64 ptrtoint (i64** getelementptr (i64*, i64** null, i32 1) to i64), i64* %7, align 4
> > > > > > ```
> > > > > > 
> > > > > > `integer :: b(1024)` :
> > > > > > 
> > > > > > ```
> > > > > >   %8 = getelementptr inbounds [1 x i64], [1 x i64]* %.offload_sizes, i32 0, i32 0
> > > > > >   store i64 ptrtoint ([1024 x i32]** getelementptr ([1024 x i32]*, [1024 x i32]** null, i32 1) to i64), i64* %8, align 4
> > > > > > ```
> > > > > > 
> > > > > > Let me know if this makes sense.
> > > > > > 
> > > > > > Thanks,
> > > > > > Akash
> > > > > Ahh OK. It does make it a bit harder to read.
> > > > > 
> > > > > But going to back to my general question:
> > > > > ```
> > > > > In general how are you passing the size of the fortran variable/type to the OpenMP runtime? For scalars and arrays with sizes known at compile time, this comes from the type itself. But for other types like assumed-shape arrays, variable length arrays this information comes from the descriptor or from other fields. My question is how is this being collected and passed to the runtime
> > > > > ```
> > > > > 
> > > > > Consider the following two subroutines, It has two assumed shape arrays. In sb0 it is a rank-1 array, in sb1 it is a rank-2 array. At the llvm dialect layer, these two will be represented by struct equivalents of Fortran descriptors as given below. If we now find the size of the types, it would get the size of the descriptor struct rather than it memory it is referring to. I guess this is not what we want to do. I believe this would require some special processing, unless the patch also does something for this.
> > > > > 
> > > > > ```
> > > > > omp.target_data   map((to -> %arg0 : !llvm.ptr<struct<(ptr<i32>, i64, i32, i8, i8, i8, i8, array<1 x array<3 x i64>>)>>))
> > > > > ```
> > > > > 
> > > > > ```
> > > > > omp.target_data   map((to -> %arg0 : !llvm.ptr<struct<(ptr<i32>, i64, i32, i8, i8, i8, i8, array<1 x array<3 x i64>>)>>))
> > > > > ```
> > > > > 
> > > > > 
> > > > > ```
> > > > > subroutine sb0(a)
> > > > >    integer :: a(:)
> > > > >    !$omp target data map(to: a)
> > > > >    a(10) = 20
> > > > >    !$omp end target data
> > > > > end subroutine
> > > > > 
> > > > > subroutine sb1(a)
> > > > >    integer :: a(:,:)
> > > > >    !$omp target data map(to: a)
> > > > >    a(5,6) = 20
> > > > >    !$omp end target data
> > > > > end subroutine
> > > > > ```
> > > > Just want to clarify that I am not expecting a fix here. But just a statement about what is supported and what is not supported.
> > > In line 1382 you can see how the size is determined. It is done simply by passing the map variable to the `getSizeInBytes` function.
> > > 
> > > I haven't added any special handling for assumed-shape arrays or any other special cases. I am not sure if it is equivalent to an `int*` in C, but looking at the llvm IR, if the map variable is an `i32*` then clang sets the size to 4, and for assumed-shape arrays in your examples `sb0` and `sb1` it also creates `i32*` in the llvm IR and the size for these calculated in the same way.
> > > 
> > > I am not sure if this is the correct behaviour. Do you know what is the correct size that should be passed in your examples to the runtime? If it is relatively straightforward then I can look into adding support for these, otherwise maybe we can add a TODO for handling assumed-shape arrays in a future patch.
> > Sorry, made a mistake in previous comment.
> > 
> > Clang sets size to 8 and not 4 for `i32*`.
> Also, the standard specifies `assumed`-size arrays are not supported. I am not sure if the arrays in your example are assumed-shape or assumed-size.
They are assumed-shape arrays. 

We are probably missing a semantic check for the assumed-size case. But good to know that they need not be supported. In one of the examples, you provided, it was mentioned to be supported. If you can create a ticket for this that would be great. Gfortran catches this error.
```
subroutine openmp_target_data_region(a)
    real :: a(*)
    !$omp target enter data map(to: a)
end subroutine openmp_target_data_region
```

In general, besides assumed-shape, there are various cases where a descriptor can come in. Like for pointers and allocatable arrays. 
```
subroutine sb4
   integer, pointer :: a(:)
   allocate(a(10))
   !$omp target data map(to: a)
   a(10) = 20
   !$omp end target data
end subroutine
```

For array sections:
```
subroutine sb4
   integer :: a(10)
   !$omp target data map(to: a(3:10))
   a(10) = 20
   !$omp end target data
end subroutine
```

I believe the size should be passed to the runtime, then only it can compute how many bytes should be mapped.

>From your answers, 
1. The following types are supported:
scalars, constant sized arrays
2. The following types are probably supported:
variable length arrays, derived types with elements of type in (1) or (2)
3. The following types are not supported:
assumed-size arrays, pointers, allocatables, derived type with elements of type in (3), array-sections (this segfaults now).

Would it be OK to add types in (2) and (3) as TODO failures in `flang`?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D142914/new/

https://reviews.llvm.org/D142914