[Mlir-commits] [mlir] [Flang][OpenMP][MLIR] Initial array section mapping MLIR -> LLVM-IR lowering utilising omp.bounds (PR #68689)

Tue Oct 17 07:43:26 PDT 2023

================
@@ -1629,13 +1622,153 @@ getRefPtrIfDeclareTarget(mlir::Value value,
   return nullptr;
 }
 
+// A small helper structure to contain data gathered
+// for map lowering and coalese it into one area and
+// avoiding extra computations such as searches in the
+// llvm module for lowered mapped varibles or checking
+// if something is declare target (and retrieving the
+// value).
+struct MapData {
+  bool isDeclareTarget = false;
+  mlir::Operation *mapClause;
+  llvm::Value *basePointer;
+  llvm::Value *pointer;
+  llvm::Value *kernelValue;
+  llvm::Type *underlyingType;
+  llvm::Value *sizeInBytes;
+};
+
+uint64_t getArrayElementSizeInBits(LLVM::LLVMArrayType arrTy, DataLayout &dl) {
+  if (auto nestedArrTy = llvm::dyn_cast_if_present<LLVM::LLVMArrayType>(
+          arrTy.getElementType()))
+    return getArrayElementSizeInBits(nestedArrTy, dl);
+  return dl.getTypeSizeInBits(arrTy.getElementType());
+}
+
+// This function calculates the size to be offloaded for a specified type, given
+// its associated map clause (which can contain bounds information which affects
+// the total size), this size is calculated based on the underlying element type
+// e.g. given a 1-D array of ints, we will calculate the size from the integer
+// type * number of elements in the array. This size can be used in other
+// calculations but is ultimately used as an argument to the OpenMP runtimes
+// kernel argument structure which is generated through the combinedInfo data
+// structures.
+// This function is somewhat equivalent to Clang's getExprTypeSize inside of
+// CGOpenMPRuntime.cpp.
+llvm::Value *getSizeInBytes(DataLayout &dl, const mlir::Type &type,
----------------
TIFitis wrote:

Consider the following example:
```
subroutine omp_target(a, b, c)
   integer, intent(in) :: a, b, c
   integer :: x(a, b, c)
   !$omp target map(tofrom : x)
   !$omp end target
end subroutine omp_target
```

Here's a slice of the llvm IR generated:
```
  %.offload_sizes = alloca [1 x i64], align 8
  %kernel_args = alloca %struct.__tgt_kernel_arguments, align 8
  %4 = load i32, ptr %0, align 4
  %5 = sext i32 %4 to i64
  %6 = icmp sgt i64 %5, 0
  %7 = select i1 %6, i64 %5, i64 0
  %8 = load i32, ptr %1, align 4
  %9 = sext i32 %8 to i64
  %10 = icmp sgt i64 %9, 0
  %11 = select i1 %10, i64 %9, i64 0
  %12 = load i32, ptr %2, align 4
  %13 = sext i32 %12 to i64
  %14 = icmp sgt i64 %13, 0
  %15 = select i1 %14, i64 %13, i64 0
  %16 = mul i64 1, %7
  %17 = mul i64 %16, %11
  %18 = mul i64 %17, %15
  %19 = alloca i32, i64 %18, align 4
  %20 = sub i64 %7, 1
  %21 = sub i64 %11, 1
  %22 = sub i64 %15, 1
  br label %entry

entry:                                            ; preds = %3
  %23 = sub i64 %20, 0
  %24 = add i64 %23, 1
  %25 = sub i64 %21, 0
  %26 = add i64 %25, 1
  %27 = mul i64 %24, %26
  %28 = sub i64 %22, 0
  %29 = add i64 %28, 1
  %30 = mul i64 %27, %29
  %31 = mul i64 %30, 4
  %34 = getelementptr inbounds [1 x i64], ptr %.offload_sizes, i32 0, i32 0
  store i64 %31, ptr %34, align 8
```

The above code basically tries to recompute `%30` which is the same as `%18` already present in the alloca instruction.

My view is that we don't need and should neither use nor generate a boundsOp unless explicit bounds have been provided by the user.

Others however, have already expressed that we would like to have a boundsOp present at all times and use it whenever possible as it favours a single solution for all cases. I am not strictly against this, but I prefer the former way of doing things.

And from what I can tell Clang also reuses `%18` for the `offload_size` here.


https://github.com/llvm/llvm-project/pull/68689