[Openmp-commits] [mlir] [openmp] [MLIR][OpenMP] Refactor bounds offsetting and fix to apply to all directives (PR #84349)
via Openmp-commits
openmp-commits at lists.llvm.org
Thu Mar 7 09:48:38 PST 2024
llvmbot wrote:
<!--LLVM PR SUMMARY COMMENT-->
@llvm/pr-subscribers-mlir-openmp
Author: None (agozillon)
<details>
<summary>Changes</summary>
This PR refactors bounds offsetting by combining the two differing implementations (one applying to initial derived type member map implementation for descriptors and the other for regular arrays, effectively allocatable array vs regular array in fortran) now that it's a little simpler to do.
The PR also moves the utilization of createAlteredByCaptureMap into genMapInfoOp, where it will be correctly applied to all MapInfoData, appropriately offsetting and altering Pointer data set in the kernel argument structure on the host. This primarily means bounds offsets will now correctly apply to enter/exit/update map clauses as opposed to just the Target directive that is currently the case. A few fortran runtime tests have been added to verify this new behavior.
This PR depends on: https://github.com/llvm/llvm-project/pull/84328 and is an extraction of the larger derived type member map PR stack (so a requirement for it to land).
---
Patch is 31.12 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/84349.diff
6 Files Affected:
- (modified) mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp (+179-143)
- (modified) mlir/test/Target/LLVMIR/omptarget-fortran-allocatable-types-host.mlir (+4-5)
- (modified) mlir/test/Target/LLVMIR/omptarget-llvm.mlir (+15-10)
- (added) openmp/libomptarget/test/offloading/fortran/target-map-enter-exit-array-2.f90 (+39)
- (added) openmp/libomptarget/test/offloading/fortran/target-map-enter-exit-array-bounds.f90 (+44)
- (added) openmp/libomptarget/test/offloading/fortran/target-map-enter-exit-scalar.f90 (+33)
``````````diff
diff --git a/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp b/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
index bef227f2c583ed..ec30005b9a565e 100644
--- a/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
+++ b/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
@@ -1937,6 +1937,99 @@ void collectMapDataFromMapOperands(MapInfoData &mapData,
}
}
+/// This function calculates the array/pointer offset for map data provided
+/// with bounds operations, e.g. when provided something like the following:
+///
+/// Fortran
+/// map(tofrom: array(2:5, 3:2))
+/// or
+/// C++
+/// map(tofrom: array[1:4][2:3])
+/// We must calculate the initial pointer offset to pass across, this function
+/// performs this using bounds.
+///
+/// NOTE: which while specified in row-major order it currently needs to be
+/// flipped for Fortran's column order array allocation and access (as
+/// opposed to C++'s row-major, hence the backwards processing where order is
+/// important). This is likely important to keep in mind for the future when
+/// we incorporate a C++ frontend, both frontends will need to agree on the
+/// ordering of generated bounds operations (one may have to flip them) to
+/// make the below lowering frontend agnostic. The offload size
+/// calcualtion may also have to be adjusted for C++.
+std::vector<llvm::Value *>
+calculateBoundsOffset(LLVM::ModuleTranslation &moduleTranslation,
+ llvm::IRBuilderBase &builder, bool isArrayTy,
+ mlir::OperandRange bounds) {
+ std::vector<llvm::Value *> idx;
+ // There's no bounds to calculate an offset from, we can safely
+ // ignore and return no indices.
+ if (bounds.empty())
+ return idx;
+
+ // If we have an array type, then we have its type so can treat it as a
+ // normal GEP instruction where the bounds operations are simply indexes
+ // into the array. We currently do reverse order of the bounds, which
+ // I believe leans more towards Fortran's column-major in memory.
+ if (isArrayTy) {
+ idx.push_back(builder.getInt64(0));
+ for (int i = bounds.size() - 1; i >= 0; --i) {
+ if (auto boundOp = mlir::dyn_cast_if_present<mlir::omp::DataBoundsOp>(
+ bounds[i].getDefiningOp())) {
+ idx.push_back(moduleTranslation.lookupValue(boundOp.getLowerBound()));
+ }
+ }
+ } else {
+ // If we do not have an array type, but we have bounds, then we're dealing
+ // with a pointer that's being treated like an array and we have the
+ // underlying type e.g. an i32, or f64 etc, e.g. a fortran descriptor base
+ // address (pointer pointing to the actual data) so we must caclulate the
+ // offset using a single index which the following two loops attempts to
+ // compute.
+
+ // Calculates the size offset we need to make per row e.g. first row or
+ // column only needs to be offset by one, but the next would have to be
+ // the previous row/column offset multiplied by the extent of current row.
+ //
+ // For example ([1][10][100]):
+ //
+ // - First row/column we move by 1 for each index increment
+ // - Second row/column we move by 1 (first row/column) * 10 (extent/size of
+ // current) for 10 for each index increment
+ // - Third row/column we would move by 10 (second row/column) *
+ // (extent/size of current) 100 for 1000 for each index increment
+ std::vector<llvm::Value *> dimensionIndexSizeOffset{builder.getInt64(1)};
+ for (size_t i = 1; i < bounds.size(); ++i) {
+ if (auto boundOp = mlir::dyn_cast_if_present<mlir::omp::DataBoundsOp>(
+ bounds[i].getDefiningOp())) {
+ dimensionIndexSizeOffset.push_back(builder.CreateMul(
+ moduleTranslation.lookupValue(boundOp.getExtent()),
+ dimensionIndexSizeOffset[i - 1]));
+ }
+ }
+
+ // Now that we have calculated how much we move by per index, we must
+ // multiply each lower bound offset in indexes by the size offset we
+ // have calculated in the previous and accumulate the results to get
+ // our final resulting offset.
+ for (int i = bounds.size() - 1; i >= 0; --i) {
+ if (auto boundOp = mlir::dyn_cast_if_present<mlir::omp::DataBoundsOp>(
+ bounds[i].getDefiningOp())) {
+ if (idx.empty())
+ idx.emplace_back(builder.CreateMul(
+ moduleTranslation.lookupValue(boundOp.getLowerBound()),
+ dimensionIndexSizeOffset[i]));
+ else
+ idx.back() = builder.CreateAdd(
+ idx.back(), builder.CreateMul(moduleTranslation.lookupValue(
+ boundOp.getLowerBound()),
+ dimensionIndexSizeOffset[i]));
+ }
+ }
+ }
+
+ return idx;
+}
+
// This creates two insertions into the MapInfosTy data structure for the
// "parent" of a set of members, (usually a container e.g.
// class/structure/derived type) when subsequent members have also been
@@ -2047,55 +2140,7 @@ static void processMapMembersWithParent(
LLVM::createMappingInformation(memberClause.getLoc(), ompBuilder));
combinedInfo.BasePointers.emplace_back(mapData.BasePointers[memberDataIdx]);
-
- std::vector<llvm::Value *> idx{builder.getInt64(0)};
- llvm::Value *offsetAddress = nullptr;
- if (!memberClause.getBounds().empty()) {
- if (mapData.BaseType[memberDataIdx]->isArrayTy()) {
- for (int i = memberClause.getBounds().size() - 1; i >= 0; --i) {
- if (auto boundOp = mlir::dyn_cast_if_present<mlir::omp::DataBoundsOp>(
- memberClause.getBounds()[i].getDefiningOp())) {
- idx.push_back(
- moduleTranslation.lookupValue(boundOp.getLowerBound()));
- }
- }
- } else {
- std::vector<llvm::Value *> dimensionIndexSizeOffset{
- builder.getInt64(1)};
- for (size_t i = 1; i < memberClause.getBounds().size(); ++i) {
- if (auto boundOp = mlir::dyn_cast_if_present<mlir::omp::DataBoundsOp>(
- memberClause.getBounds()[i].getDefiningOp())) {
- dimensionIndexSizeOffset.push_back(builder.CreateMul(
- moduleTranslation.lookupValue(boundOp.getExtent()),
- dimensionIndexSizeOffset[i - 1]));
- }
- }
-
- for (int i = memberClause.getBounds().size() - 1; i >= 0; --i) {
- if (auto boundOp = mlir::dyn_cast_if_present<mlir::omp::DataBoundsOp>(
- memberClause.getBounds()[i].getDefiningOp())) {
- if (!offsetAddress)
- offsetAddress = builder.CreateMul(
- moduleTranslation.lookupValue(boundOp.getLowerBound()),
- dimensionIndexSizeOffset[i]);
- else
- offsetAddress = builder.CreateAdd(
- offsetAddress,
- builder.CreateMul(
- moduleTranslation.lookupValue(boundOp.getLowerBound()),
- dimensionIndexSizeOffset[i]));
- }
- }
- }
- }
-
- llvm::Value *memberIdx =
- builder.CreateLoad(builder.getPtrTy(), mapData.Pointers[memberDataIdx]);
- memberIdx = builder.CreateInBoundsGEP(
- mapData.BaseType[memberDataIdx], memberIdx,
- offsetAddress ? std::vector<llvm::Value *>{offsetAddress} : idx,
- "member_idx");
- combinedInfo.Pointers.emplace_back(memberIdx);
+ combinedInfo.Pointers.emplace_back(mapData.Pointers[memberDataIdx]);
combinedInfo.Sizes.emplace_back(mapData.Sizes[memberDataIdx]);
}
}
@@ -2113,6 +2158,77 @@ static void processMapWithMembersOf(
memberOfParentFlag);
}
+// This is a variation on Clang's GenerateOpenMPCapturedVars, which
+// generates different operation (e.g. load/store) combinations for
+// arguments to the kernel, based on map capture kinds which are then
+// utilised in the combinedInfo in place of the original Map value.
+static void
+createAlteredByCaptureMap(MapInfoData &mapData,
+ LLVM::ModuleTranslation &moduleTranslation,
+ llvm::IRBuilderBase &builder) {
+ for (size_t i = 0; i < mapData.MapClause.size(); ++i) {
+ // if it's declare target, skip it, it's handled seperately.
+ if (!mapData.IsDeclareTarget[i]) {
+ auto mapOp =
+ mlir::dyn_cast_if_present<mlir::omp::MapInfoOp>(mapData.MapClause[i]);
+ mlir::omp::VariableCaptureKind captureKind =
+ mapOp.getMapCaptureType().value_or(
+ mlir::omp::VariableCaptureKind::ByRef);
+ bool isPtrTy = checkIfPointerMap(
+ llvm::omp::OpenMPOffloadMappingFlags(mapOp.getMapType().value()));
+
+ // Currently handles array sectioning lowerbound case, but more
+ // logic may be required in the future. Clang invokes EmitLValue,
+ // which has specialised logic for special Clang types such as user
+ // defines, so it is possible we will have to extend this for
+ // structures or other complex types. As the general idea is that this
+ // function mimics some of the logic from Clang that we require for
+ // kernel argument passing from host -> device.
+ switch (captureKind) {
+ case mlir::omp::VariableCaptureKind::ByRef: {
+ llvm::Value *newV = mapData.Pointers[i];
+ std::vector<llvm::Value *> offsetIdx = calculateBoundsOffset(
+ moduleTranslation, builder, mapData.BaseType[i]->isArrayTy(),
+ mapOp.getBounds());
+ if (isPtrTy)
+ newV = builder.CreateLoad(builder.getPtrTy(), newV);
+
+ if (!offsetIdx.empty())
+ newV = builder.CreateInBoundsGEP(mapData.BaseType[i], newV, offsetIdx,
+ "array_offset");
+ mapData.Pointers[i] = newV;
+ } break;
+ case mlir::omp::VariableCaptureKind::ByCopy: {
+ llvm::Type *type = mapData.BaseType[i];
+ llvm::Value *newV;
+ if (mapData.Pointers[i]->getType()->isPointerTy())
+ newV = builder.CreateLoad(type, mapData.Pointers[i]);
+ else
+ newV = mapData.Pointers[i];
+
+ if (!isPtrTy) {
+ auto curInsert = builder.saveIP();
+ builder.restoreIP(findAllocaInsertPoint(builder, moduleTranslation));
+ auto *memTempAlloc =
+ builder.CreateAlloca(builder.getPtrTy(), nullptr, ".casted");
+ builder.restoreIP(curInsert);
+
+ builder.CreateStore(newV, memTempAlloc);
+ newV = builder.CreateLoad(builder.getPtrTy(), memTempAlloc);
+ }
+
+ mapData.Pointers[i] = newV;
+ mapData.BasePointers[i] = newV;
+ } break;
+ case mlir::omp::VariableCaptureKind::This:
+ case mlir::omp::VariableCaptureKind::VLAType:
+ mapData.MapClause[i]->emitOpError("Unhandled capture kind");
+ break;
+ }
+ }
+ }
+}
+
// Generate all map related information and fill the combinedInfo.
static void genMapInfos(llvm::IRBuilderBase &builder,
LLVM::ModuleTranslation &moduleTranslation,
@@ -2122,6 +2238,20 @@ static void genMapInfos(llvm::IRBuilderBase &builder,
const SmallVector<Value> &devPtrOperands = {},
const SmallVector<Value> &devAddrOperands = {},
bool isTargetParams = false) {
+ // We wish to modify some of the methods in which arguments are
+ // passed based on their capture type by the target region, this can
+ // involve generating new loads and stores, which changes the
+ // MLIR value to LLVM value mapping, however, we only wish to do this
+ // locally for the current function/target and also avoid altering
+ // ModuleTranslation, so we remap the base pointer or pointer stored
+ // in the map infos corresponding MapInfoData, which is later accessed
+ // by genMapInfos and createTarget to help generate the kernel and
+ // kernel arg structure. It primarily becomes relevant in cases like
+ // bycopy, or byref range'd arrays. In the default case, we simply
+ // pass thee pointer byref as both basePointer and pointer.
+ if (!moduleTranslation.getOpenMPBuilder()->Config.isTargetDevice())
+ createAlteredByCaptureMap(mapData, moduleTranslation, builder);
+
llvm::OpenMPIRBuilder *ompBuilder = moduleTranslation.getOpenMPBuilder();
auto fail = [&combinedInfo]() -> void {
@@ -2616,86 +2746,6 @@ createDeviceArgumentAccessor(MapInfoData &mapData, llvm::Argument &arg,
return builder.saveIP();
}
-// This is a variation on Clang's GenerateOpenMPCapturedVars, which
-// generates different operation (e.g. load/store) combinations for
-// arguments to the kernel, based on map capture kinds which are then
-// utilised in the combinedInfo in place of the original Map value.
-static void
-createAlteredByCaptureMap(MapInfoData &mapData,
- LLVM::ModuleTranslation &moduleTranslation,
- llvm::IRBuilderBase &builder) {
- for (size_t i = 0; i < mapData.MapClause.size(); ++i) {
- // if it's declare target, skip it, it's handled seperately.
- if (!mapData.IsDeclareTarget[i]) {
- mlir::omp::VariableCaptureKind captureKind =
- mlir::omp::VariableCaptureKind::ByRef;
-
- if (auto mapOp = mlir::dyn_cast_if_present<mlir::omp::MapInfoOp>(
- mapData.MapClause[i])) {
- captureKind = mapOp.getMapCaptureType().value_or(
- mlir::omp::VariableCaptureKind::ByRef);
- }
-
- switch (captureKind) {
- case mlir::omp::VariableCaptureKind::ByRef: {
- // Currently handles array sectioning lowerbound case, but more
- // logic may be required in the future. Clang invokes EmitLValue,
- // which has specialised logic for special Clang types such as user
- // defines, so it is possible we will have to extend this for
- // structures or other complex types. As the general idea is that this
- // function mimics some of the logic from Clang that we require for
- // kernel argument passing from host -> device.
- if (auto mapOp = mlir::dyn_cast_if_present<mlir::omp::MapInfoOp>(
- mapData.MapClause[i])) {
- if (!mapOp.getBounds().empty() && mapData.BaseType[i]->isArrayTy()) {
-
- std::vector<llvm::Value *> idx =
- std::vector<llvm::Value *>{builder.getInt64(0)};
- for (int i = mapOp.getBounds().size() - 1; i >= 0; --i) {
- if (auto boundOp =
- mlir::dyn_cast_if_present<mlir::omp::DataBoundsOp>(
- mapOp.getBounds()[i].getDefiningOp())) {
- idx.push_back(
- moduleTranslation.lookupValue(boundOp.getLowerBound()));
- }
- }
-
- mapData.Pointers[i] = builder.CreateInBoundsGEP(
- mapData.BaseType[i], mapData.Pointers[i], idx);
- }
- }
- } break;
- case mlir::omp::VariableCaptureKind::ByCopy: {
- llvm::Type *type = mapData.BaseType[i];
- llvm::Value *newV;
- if (mapData.Pointers[i]->getType()->isPointerTy())
- newV = builder.CreateLoad(type, mapData.Pointers[i]);
- else
- newV = mapData.Pointers[i];
-
- if (!type->isPointerTy()) {
- auto curInsert = builder.saveIP();
- builder.restoreIP(findAllocaInsertPoint(builder, moduleTranslation));
- auto *memTempAlloc =
- builder.CreateAlloca(builder.getPtrTy(), nullptr, ".casted");
- builder.restoreIP(curInsert);
-
- builder.CreateStore(newV, memTempAlloc);
- newV = builder.CreateLoad(builder.getPtrTy(), memTempAlloc);
- }
-
- mapData.Pointers[i] = newV;
- mapData.BasePointers[i] = newV;
- } break;
- case mlir::omp::VariableCaptureKind::This:
- case mlir::omp::VariableCaptureKind::VLAType:
- mapData.MapClause[i]->emitOpError("Unhandled capture kind");
- break;
- }
- }
- }
-}
-
static LogicalResult
convertOmpTarget(Operation &opInst, llvm::IRBuilderBase &builder,
LLVM::ModuleTranslation &moduleTranslation) {
@@ -2764,20 +2814,6 @@ convertOmpTarget(Operation &opInst, llvm::IRBuilderBase &builder,
collectMapDataFromMapOperands(mapData, mapOperands, moduleTranslation, dl,
builder);
- // We wish to modify some of the methods in which kernel arguments are
- // passed based on their capture type by the target region, this can
- // involve generating new loads and stores, which changes the
- // MLIR value to LLVM value mapping, however, we only wish to do this
- // locally for the current function/target and also avoid altering
- // ModuleTranslation, so we remap the base pointer or pointer stored
- // in the map infos corresponding MapInfoData, which is later accessed
- // by genMapInfos and createTarget to help generate the kernel and
- // kernel arg structure. It primarily becomes relevant in cases like
- // bycopy, or byref range'd arrays. In the default case, we simply
- // pass thee pointer byref as both basePointer and pointer.
- if (!moduleTranslation.getOpenMPBuilder()->Config.isTargetDevice())
- createAlteredByCaptureMap(mapData, moduleTranslation, builder);
-
llvm::OpenMPIRBuilder::MapInfosTy combinedInfos;
auto genMapInfoCB = [&](llvm::OpenMPIRBuilder::InsertPointTy codeGenIP)
-> llvm::OpenMPIRBuilder::MapInfosTy & {
diff --git a/mlir/test/Target/LLVMIR/omptarget-fortran-allocatable-types-host.mlir b/mlir/test/Target/LLVMIR/omptarget-fortran-allocatable-types-host.mlir
index 831cd05871c4e4..8ab6d5deaadac1 100644
--- a/mlir/test/Target/LLVMIR/omptarget-fortran-allocatable-types-host.mlir
+++ b/mlir/test/Target/LLVMIR/omptarget-fortran-allocatable-types-host.mlir
@@ -81,20 +81,19 @@ module attributes {omp.is_target_device = false} {
// CHECK: %[[ARR_SECT_SIZE2:.*]] = add i64 %[[ARR_SECT_SIZE3]], 1
// CHECK: %[[ARR_SECT_SIZE1:.*]] = mul i64 1, %[[ARR_SECT_SIZE2]]
// CHECK: %[[ARR_SECT_SIZE:.*]] = mul i64 %[[ARR_SECT_SIZE1]], 4
-// CHECK: %[[FULL_ARR_DESC_SIZE:.*]] = sdiv exact i64 sub (i64 ptrtoint (ptr getelementptr inbounds ({ ptr, i64, i32, i8, i8, i8, i8, [1 x [3 x i64]] }, ptr @_QFEfull_arr, i32 1) to i64), i64 ptrtoint (ptr @_QFEfull_arr to i64)), ptrtoint (ptr getelementptr (i8, ptr null, i32 1) to i64)
// CHECK: %[[LFULL_ARR:.*]] = load ptr, ptr @_QFEfull_arr, align 8
// CHECK: %[[FULL_ARR_PTR:.*]] = getelementptr inbounds float, ptr %[[LFULL_ARR]], i64 0
-// CHECK: %[[ARR_SECT_DESC_SIZE:.*]] = sdiv exact i64 sub (i64 ptrtoint (ptr getelementptr inbounds ({ ptr, i64, i32, i8, i8, i8, i8, [1 x [3 x i64]] }, ptr @_QFEsect_arr, i32 1) to i64), i64 ptrtoint (ptr @_QFEsect_arr to i64)), ptrtoint (ptr getelementptr (i8, ptr null, i32 1) to i64)
// CHECK: %[[ARR_SECT_OFFSET1:.*]] = mul i64 %[[ARR_SECT_OFFSET2]], 1
// CHECK: %[[LARR_SECT:.*]] = load ptr, ptr @_QFEsect_arr, align 8
// CHECK: %[[ARR_SECT_PTR:.*]] = getelementptr inbounds i32, ptr %[[LARR_SECT]], i64 %[[ARR_SECT_OFFSET1]]
+// CHECK: %[[SCALAR_PTR_LOAD:.*]] = load ptr, ptr %[[SCALAR_BASE]], align 8
+// CHECK: %[[FULL_ARR_DESC_SIZE:.*]] = sdiv exact i64 sub (i64 ptrtoint (ptr getelementptr inbounds ({ ptr, i64, i32, i8, i8, i8, i8, [1 x [3 x i64]] }, ptr @_QFEfull_arr, i32 1) to i64), i64 ptrtoint (ptr @_QFEfull_arr to i64)), ptrtoint (ptr getelementptr (i8, ptr null, i32 1) to i64)
+// CHECK: %[[ARR_SECT_DESC_SIZE:.*]] = sdiv exact i64 sub (i64 ptrtoint (ptr getelementptr inbounds ({ ptr, i64, i32, i8, i8, i8, i8, [1 x [3 x i64]] }, ptr @_QFEsect_arr, i32 1) to i64), i64 ptrtoint (ptr @_QFEsect_arr to i64)), ptrtoint (ptr getelementptr (i8, ptr null, i32 1) to i64)
// CHECK: %[[SCALAR_DESC_SZ4:.*]] = getelementptr { ptr, i64, i32, i8, i8, i8, i8 }, ptr %[[SCALAR_ALLOCA]], i32 1
// CHECK: %[[SCALAR_DESC_SZ3:.*]] = ptrtoint ptr %[[SCALAR_DESC_SZ4]] to i64
// CHECK: %[[SCALAR_DESC_SZ2:.*]] = ptrtoint ptr %[[SCALAR_ALLOCA]] to i64
// CHECK: %[[SCALAR_DESC_SZ1:.*]] = sub i64 %[[SCALAR_DESC_SZ3]], %[[SCALAR_DESC_SZ2]]
// CHECK: %[[SCALAR_DESC_SZ:.*]] = sdiv exact i64 %[[SCALAR_DESC_SZ1]], ptrtoint (ptr getelementptr (i8, ptr null, i32 1) to i64)
-// CHECK: %[[SCALAR_PTR_LOAD:.*]] = load ptr, ptr %[[SCALAR_BASE]], align 8
-// CHECK: %[[SCALAR_PTR:.*]] = getelementptr inbounds float, ptr %[[SCALAR_PTR_LOAD]]...
[truncated]
``````````
</details>
https://github.com/llvm/llvm-project/pull/84349
More information about the Openmp-commits
mailing list