[PATCH] D142914: [MLIR][OpenMP] Added OMPIRBuilder support for Target Data directives.

Mon Mar 13 09:10:11 PDT 2023

kiranchandramohan added inline comments.
Herald added a subscriber: sunshaoce.

================
Comment at: mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp:1357
+/// Process MapOperands for Target Data directives.
+static LogicalResult processMapOperand(
+    llvm::IRBuilderBase &builder, LLVM::ModuleTranslation &moduleTranslation,
----------------
kiranchandramohan wrote:
> TIFitis wrote:
> > TIFitis wrote:
> > > kiranchandramohan wrote:
> > > > TIFitis wrote:
> > > > > kiranchandramohan wrote:
> > > > > > TIFitis wrote:
> > > > > > > kiranchandramohan wrote:
> > > > > > > > Isn't it possible to sink this whole function into the OpenMPIRBuilder by passing it a list of `mapOpValue` and `mapTypeFlags`?
> > > > > > > > `lvm::Value *mapOpValue = moduleTranslation.lookupValue(mapOp);`
> > > > > > > > 
> > > > > > > > Did i miss something? Or is this in anticipation of more processing required for other types?
> > > > > > > I'm not fully sure but we might need more MLIR related things when supporting types other than LLVMPointerType. Also there is a call to mlir::LLVM::createMappingInformation.
> > > > > > > 
> > > > > > > I guess it might still be possible to move most of it to the IRBuilder, would you like me to do that?
> > > > > > Callbacks are useful when there is frontend-specific handling that is required. If more types require to be handled then it is better to have the callback. We can revisit this after all types are handled. I assume, the current handling is for scalars and arrays of known-size.
> > > > > I am a novice at FORTRAN so I'm not aware of all  the types and scenarios.
> > > > > 
> > > > > I've tested the following cases and they work end-to-end:
> > > > > 
> > > > > **Fortran:**
> > > > > ```
> > > > > subroutine openmp_target_data_region(a)
> > > > >     real :: a(*)
> > > > >     integer :: b(1024)
> > > > >     character :: c
> > > > >     integer, pointer :: p
> > > > >     !$omp target enter data map(to: a, b, c, p)
> > > > > end subroutine openmp_target_data_region
> > > > > ```
> > > > > 
> > > > > **LLVM IR(** `flang-new -fc1 -emit-llvm -fopenmp test.f90 -o test.ll`** ):**
> > > > > 
> > > > > ```
> > > > > ; ModuleID = 'FIRModule'
> > > > > source_filename = "FIRModule"
> > > > > target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
> > > > > target triple = "x86_64-unknown-linux-gnu"
> > > > > 
> > > > > %struct.ident_t = type { i32, i32, i32, i32, ptr }
> > > > > 
> > > > > @0 = private unnamed_addr constant [13 x i8] c"loc(unknown)\00", align 1
> > > > > @1 = private unnamed_addr constant [56 x i8] c";/home/akash/Documents/scratch/test2.f90;unknown;3;16;;\00", align 1
> > > > > @2 = private unnamed_addr constant [56 x i8] c";/home/akash/Documents/scratch/test2.f90;unknown;4;18;;\00", align 1
> > > > > @3 = private unnamed_addr constant [56 x i8] c";/home/akash/Documents/scratch/test2.f90;unknown;5;25;;\00", align 1
> > > > > @4 = private unnamed_addr constant [23 x i8] c";unknown;unknown;0;0;;\00", align 1
> > > > > @5 = private unnamed_addr constant %struct.ident_t { i32 0, i32 2, i32 0, i32 22, ptr @4 }, align 8
> > > > > @.offload_maptypes = private unnamed_addr constant [4 x i64] [i64 1, i64 1, i64 1, i64 1]
> > > > > @.offload_mapnames = private constant [4 x ptr] [ptr @0, ptr @1, ptr @2, ptr @3]
> > > > > 
> > > > > declare ptr @malloc(i64)
> > > > > 
> > > > > declare void @free(ptr)
> > > > > 
> > > > > define void @openmp_target_data_region_(ptr %0) {
> > > > >   %2 = alloca [4 x ptr], align 8
> > > > >   %3 = alloca [4 x ptr], align 8
> > > > >   %4 = alloca [4 x i64], align 8
> > > > >   %5 = alloca [1024 x i32], i64 1, align 4
> > > > >   %6 = alloca [1 x i8], i64 1, align 1
> > > > >   %7 = alloca { ptr, i64, i32, i8, i8, i8, i8 }, i64 1, align 8
> > > > >   %8 = alloca ptr, i64 1, align 8
> > > > >   store ptr null, ptr %8, align 8
> > > > >   br label %entry
> > > > > 
> > > > > entry:                                            ; preds = %1
> > > > >   %9 = getelementptr inbounds [4 x ptr], ptr %2, i32 0, i32 0
> > > > >   store ptr %0, ptr %9, align 8
> > > > >   %10 = getelementptr inbounds [4 x ptr], ptr %3, i32 0, i32 0
> > > > >   store ptr %0, ptr %10, align 8
> > > > >   %11 = getelementptr inbounds [4 x i64], ptr %4, i32 0, i32 0
> > > > >   store i64 ptrtoint (ptr getelementptr (ptr, ptr null, i32 1) to i64), ptr %11, align 8
> > > > >   %12 = getelementptr inbounds [4 x ptr], ptr %2, i32 0, i32 1
> > > > >   store ptr %5, ptr %12, align 8
> > > > >   %13 = getelementptr inbounds [4 x ptr], ptr %3, i32 0, i32 1
> > > > >   store ptr %5, ptr %13, align 8
> > > > >   %14 = getelementptr inbounds [4 x i64], ptr %4, i32 0, i32 1
> > > > >   store i64 ptrtoint (ptr getelementptr (ptr, ptr null, i32 1) to i64), ptr %14, align 8
> > > > >   %15 = getelementptr inbounds [4 x ptr], ptr %2, i32 0, i32 2
> > > > >   store ptr %6, ptr %15, align 8
> > > > >   %16 = getelementptr inbounds [4 x ptr], ptr %3, i32 0, i32 2
> > > > >   store ptr %6, ptr %16, align 8
> > > > >   %17 = getelementptr inbounds [4 x i64], ptr %4, i32 0, i32 2
> > > > >   store i64 ptrtoint (ptr getelementptr (ptr, ptr null, i32 1) to i64), ptr %17, align 8
> > > > >   %18 = getelementptr inbounds [4 x ptr], ptr %2, i32 0, i32 3
> > > > >   store ptr %7, ptr %18, align 8
> > > > >   %19 = getelementptr inbounds [4 x ptr], ptr %3, i32 0, i32 3
> > > > >   store ptr %7, ptr %19, align 8
> > > > >   %20 = getelementptr inbounds [4 x i64], ptr %4, i32 0, i32 3
> > > > >   store i64 ptrtoint (ptr getelementptr (ptr, ptr null, i32 1) to i64), ptr %20, align 8
> > > > >   %21 = getelementptr inbounds [4 x ptr], ptr %2, i32 0, i32 0
> > > > >   %22 = getelementptr inbounds [4 x ptr], ptr %3, i32 0, i32 0
> > > > >   %23 = getelementptr inbounds [4 x i64], ptr %4, i32 0, i32 0
> > > > >   call void @__tgt_target_data_begin_mapper(ptr @5, i64 -1, i32 4, ptr %21, ptr %22, ptr %23, ptr @.offload_maptypes, ptr @.offload_mapnames, ptr null)
> > > > >   ret void
> > > > > }
> > > > > 
> > > > > ; Function Attrs: nounwind
> > > > > declare void @__tgt_target_data_begin_mapper(ptr, i64, i32, ptr, ptr, ptr, ptr, ptr, ptr) #0
> > > > > 
> > > > > ; Function Attrs: nounwind
> > > > > declare void @__tgt_target_data_end_mapper(ptr, i64, i32, ptr, ptr, ptr, ptr, ptr, ptr) #0
> > > > > 
> > > > > attributes #0 = { nounwind }
> > > > > 
> > > > > !llvm.module.flags = !{!0}
> > > > > 
> > > > > !0 = !{i32 2, !"Debug Info Version", i32 3}
> > > > > ```
> > > > > 
> > > > > 
> > > > > If I am missing some important types here then please let me know, I'll try to see if they work and if not I'll add support for them in further patches.
> > > > In general how are you passing the size of the fortran variable/type to the OpenMP runtime? For scalars  and arrays with sizes known at compile time, this comes from the type itself. But for other types like assumed-shape arrays, variable length arrays this information comes from the descriptor or from other fields. My question is how is this being collected and passed to the runtime?
> > > > 
> > > > For all the types, I see the following code in the IR you gave above for generating the `ArgSizes` argument of `__tgt_target_data_begin_mapper`. I don't understand how the code (and size) be the same for all the types.
> > > > ```
> > > > ...
> > > > %11 = getelementptr inbounds [4 x i64], ptr %4, i32 0, i32 0
> > > > store i64 ptrtoint (ptr getelementptr (ptr, ptr null, i32 1) to i64), ptr %11, align 8
> > > > ...
> > > > %14 = getelementptr inbounds [4 x i64], ptr %4, i32 0, i32 1
> > > > store i64 ptrtoint (ptr getelementptr (ptr, ptr null, i32 1) to i64), ptr %14, align 8
> > > > ...
> > > > %17 = getelementptr inbounds [4 x i64], ptr %4, i32 0, i32 2
> > > > store i64 ptrtoint (ptr getelementptr (ptr, ptr null, i32 1) to i64), ptr %17, align 8
> > > > ...
> > > > %20 = getelementptr inbounds [4 x i64], ptr %4, i32 0, i32 3
> > > > store i64 ptrtoint (ptr getelementptr (ptr, ptr null, i32 1) to i64), ptr %20, align 8
> > > > ...
> > > > %23 = getelementptr inbounds [4 x i64], ptr %4, i32 0, i32 0
> > > > call void @__tgt_target_data_begin_mapper(ptr @5, i64 -1, i32 4, ptr %21, ptr %22, ptr %23, ptr @.offload_maptypes, ptr @.offload_mapnames, ptr null)
> > > > ```
> > > > 
> > > > I would like some more clarity on this before proceeding. Clang generates different code for this and I see that it is appropriately filling the `ArgSizes` field.
> > > `OpenMPIRBuilder::getSizeInBytes` is the function responsible for calculating the `ArgSizes`.
> > > 
> > > For the Value : `%1 = alloca i64, i64 1, align 8` it returns size as `i64 ptrtoint (ptr getelementptr (ptr, ptr null, i32 1) to i64)` and TBH I don't understand this. This function was taken from OpenACC.
> > > 
> > > I will re-implement this function and update the patch.
> > Actually, the generated code seems correct. The first answer [[ https://stackoverflow.com/questions/14608250/how-can-i-find-the-size-of-a-type | here ]] gives insight into how `OpenMPIRBuilder::getSizeInBytes` is calculating the size of the type.
> > 
> > Opaque pointers make it look the same for all the different types, disabling opaque pointers you get something like the following:
> > 
> > `integer(8) :: a` :
> > ```
> >    %7 = getelementptr inbounds [1 x i64], [1 x i64]* %.offload_sizes, i32 0, i32 0
> >    store i64 ptrtoint (i64** getelementptr (i64*, i64** null, i32 1) to i64), i64* %7, align 4
> > ```
> > 
> > `integer :: b(1024)` :
> > 
> > ```
> >   %8 = getelementptr inbounds [1 x i64], [1 x i64]* %.offload_sizes, i32 0, i32 0
> >   store i64 ptrtoint ([1024 x i32]** getelementptr ([1024 x i32]*, [1024 x i32]** null, i32 1) to i64), i64* %8, align 4
> > ```
> > 
> > Let me know if this makes sense.
> > 
> > Thanks,
> > Akash
> Ahh OK. It does make it a bit harder to read.
> 
> But going to back to my general question:
> ```
> In general how are you passing the size of the fortran variable/type to the OpenMP runtime? For scalars and arrays with sizes known at compile time, this comes from the type itself. But for other types like assumed-shape arrays, variable length arrays this information comes from the descriptor or from other fields. My question is how is this being collected and passed to the runtime
> ```
> 
> Consider the following two subroutines, It has two assumed shape arrays. In sb0 it is a rank-1 array, in sb1 it is a rank-2 array. At the llvm dialect layer, these two will be represented by struct equivalents of Fortran descriptors as given below. If we now find the size of the types, it would get the size of the descriptor struct rather than it memory it is referring to. I guess this is not what we want to do. I believe this would require some special processing, unless the patch also does something for this.
> 
> ```
> omp.target_data   map((to -> %arg0 : !llvm.ptr<struct<(ptr<i32>, i64, i32, i8, i8, i8, i8, array<1 x array<3 x i64>>)>>))
> ```
> 
> ```
> omp.target_data   map((to -> %arg0 : !llvm.ptr<struct<(ptr<i32>, i64, i32, i8, i8, i8, i8, array<1 x array<3 x i64>>)>>))
> ```
> 
> 
> ```
> subroutine sb0(a)
>    integer :: a(:)
>    !$omp target data map(to: a)
>    a(10) = 20
>    !$omp end target data
> end subroutine
> 
> subroutine sb1(a)
>    integer :: a(:,:)
>    !$omp target data map(to: a)
>    a(5,6) = 20
>    !$omp end target data
> end subroutine
> ```
Just want to clarify that I am not expecting a fix here. But just a statement about what is supported and what is not supported.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D142914/new/

https://reviews.llvm.org/D142914