[Mlir-commits] [mlir] [MLIR][OpenMP] Skip host omp ops when compiling for the target device (PR #85239)
Sergio Afonso
llvmlistbot at llvm.org
Mon Mar 18 04:55:39 PDT 2024
================
@@ -2922,6 +2922,178 @@ convertDeclareTargetAttr(Operation *op, mlir::omp::DeclareTargetAttr attribute,
return success();
}
+static bool isInternalTargetDeviceOp(Operation *op) {
+ // Assumes no reverse offloading
+ if (op->getParentOfType<omp::TargetOp>())
+ return true;
+
+ auto parentFn = op->getParentOfType<LLVM::LLVMFuncOp>();
+ if (auto declareTargetIface =
+ llvm::dyn_cast<mlir::omp::DeclareTargetInterface>(
+ parentFn.getOperation()))
+ if (declareTargetIface.isDeclareTarget() &&
+ declareTargetIface.getDeclareTargetDeviceType() !=
+ mlir::omp::DeclareTargetDeviceType::host)
+ return true;
+
+ return false;
+}
+
+/// Given an OpenMP MLIR operation, create the corresponding LLVM IR
+/// (including OpenMP runtime calls).
+static LogicalResult
+convertCommonOperation(Operation *op, llvm::IRBuilderBase &builder,
+ LLVM::ModuleTranslation &moduleTranslation) {
+
+ llvm::OpenMPIRBuilder *ompBuilder = moduleTranslation.getOpenMPBuilder();
+
+ return llvm::TypeSwitch<Operation *, LogicalResult>(op)
+ .Case([&](omp::BarrierOp) {
+ ompBuilder->createBarrier(builder.saveIP(), llvm::omp::OMPD_barrier);
+ return success();
+ })
+ .Case([&](omp::TaskwaitOp) {
+ ompBuilder->createTaskwait(builder.saveIP());
+ return success();
+ })
+ .Case([&](omp::TaskyieldOp) {
+ ompBuilder->createTaskyield(builder.saveIP());
+ return success();
+ })
+ .Case([&](omp::FlushOp) {
+ // No support in Openmp runtime function (__kmpc_flush) to accept
+ // the argument list.
+ // OpenMP standard states the following:
+ // "An implementation may implement a flush with a list by ignoring
+ // the list, and treating it the same as a flush without a list."
+ //
+ // The argument list is discarded so that, flush with a list is treated
+ // same as a flush without a list.
+ ompBuilder->createFlush(builder.saveIP());
+ return success();
+ })
+ .Case([&](omp::ParallelOp op) {
+ return convertOmpParallel(op, builder, moduleTranslation);
+ })
+ .Case([&](omp::ReductionOp reductionOp) {
+ return convertOmpReductionOp(reductionOp, builder, moduleTranslation);
+ })
+ .Case([&](omp::MasterOp) {
+ return convertOmpMaster(*op, builder, moduleTranslation);
+ })
+ .Case([&](omp::CriticalOp) {
+ return convertOmpCritical(*op, builder, moduleTranslation);
+ })
+ .Case([&](omp::OrderedRegionOp) {
+ return convertOmpOrderedRegion(*op, builder, moduleTranslation);
+ })
+ .Case([&](omp::OrderedOp) {
+ return convertOmpOrdered(*op, builder, moduleTranslation);
+ })
+ .Case([&](omp::WsLoopOp) {
+ return convertOmpWsLoop(*op, builder, moduleTranslation);
+ })
+ .Case([&](omp::SimdLoopOp) {
+ return convertOmpSimdLoop(*op, builder, moduleTranslation);
+ })
+ .Case([&](omp::AtomicReadOp) {
+ return convertOmpAtomicRead(*op, builder, moduleTranslation);
+ })
+ .Case([&](omp::AtomicWriteOp) {
+ return convertOmpAtomicWrite(*op, builder, moduleTranslation);
+ })
+ .Case([&](omp::AtomicUpdateOp op) {
+ return convertOmpAtomicUpdate(op, builder, moduleTranslation);
+ })
+ .Case([&](omp::AtomicCaptureOp op) {
+ return convertOmpAtomicCapture(op, builder, moduleTranslation);
+ })
+ .Case([&](omp::SectionsOp) {
+ return convertOmpSections(*op, builder, moduleTranslation);
+ })
+ .Case([&](omp::SingleOp op) {
+ return convertOmpSingle(op, builder, moduleTranslation);
+ })
+ .Case([&](omp::TeamsOp op) {
+ return convertOmpTeams(op, builder, moduleTranslation);
+ })
+ .Case([&](omp::TaskOp op) {
+ return convertOmpTaskOp(op, builder, moduleTranslation);
+ })
+ .Case([&](omp::TaskGroupOp op) {
+ return convertOmpTaskgroupOp(op, builder, moduleTranslation);
+ })
+ .Case<omp::YieldOp, omp::TerminatorOp, omp::ReductionDeclareOp,
+ omp::CriticalDeclareOp>([](auto op) {
+ // `yield` and `terminator` can be just omitted. The block structure
+ // was created in the region that handles their parent operation.
+ // `reduction.declare` will be used by reductions and is not
+ // converted directly, skip it.
+ // `critical.declare` is only used to declare names of critical
+ // sections which will be used by `critical` ops and hence can be
+ // ignored for lowering. The OpenMP IRBuilder will create unique
+ // name for critical section names.
+ return success();
+ })
+ .Case([&](omp::ThreadprivateOp) {
+ return convertOmpThreadprivate(*op, builder, moduleTranslation);
+ })
+ .Case<omp::DataOp, omp::EnterDataOp, omp::ExitDataOp, omp::UpdateDataOp>(
+ [&](auto op) {
+ return convertOmpTargetData(op, builder, moduleTranslation);
+ })
+ .Case([&](omp::TargetOp) {
+ return convertOmpTarget(*op, builder, moduleTranslation);
+ })
+ .Case<omp::MapInfoOp, omp::DataBoundsOp, omp::PrivateClauseOp>(
+ [&](auto op) {
+ // No-op, should be handled by relevant owning operations e.g.
+ // TargetOp, EnterDataOp, ExitDataOp, DataOp etc. and then
+ // discarded
+ return success();
+ })
+ .Default([&](Operation *inst) {
+ return inst->emitError("unsupported OpenMP operation: ")
+ << inst->getName();
+ });
+}
+
+static LogicalResult
+convertInternalTargetOp(Operation *op, llvm::IRBuilderBase &builder,
+ LLVM::ModuleTranslation &moduleTranslation) {
+ return convertCommonOperation(op, builder, moduleTranslation);
+}
+
+static LogicalResult
+convertTopLevelTargetOp(Operation *op, llvm::IRBuilderBase &builder,
+ LLVM::ModuleTranslation &moduleTranslation) {
+ return llvm::TypeSwitch<Operation *, LogicalResult>(op)
+ .Case<omp::DataOp, omp::EnterDataOp, omp::ExitDataOp, omp::UpdateDataOp>(
+ [&](auto op) {
+ return convertOmpTargetData(op, builder, moduleTranslation);
+ })
+ .Case([&](omp::TargetOp) {
+ return convertOmpTarget(*op, builder, moduleTranslation);
+ })
+ // Skip omp ops that are not legal top level ops for the target device
----------------
skatrak wrote:
> Probably not all operations with inner regions can contain top-level ops, but that should be handled earlier in the compiler, so we shouldn't have to check for illegal combinations.
I agree, this would be way too late to realize that OpenMP operations were nested in an unsupported way.
> I think we are okay since omp.target is isolated from above, so I believe there shouldn't be any values that need to be translated (codegen seems to work for the omp.task { omp.target {}} case.
I would think that the region inside of omp.target should be translatable for that reason. My concern was more about arguments to that operation. For example, there's some processing of map arguments as part of `convertOmpTarget` that is done for both host and device. Will that still work if we haven't translated them to LLVM IR as part of lowering its enclosing region?
```mlir
%c1 = llvm.mlir.constant(1 : i64) : i64
%x = llvm.alloca %c1 x i32 : (i64) -> !llvm.ptr
omp.task {
%map = omp.map_info var_ptr(%x : !llvm.ptr, i32) map_clauses(tofrom) capture(ByRef) -> !llvm.ptr
omp.target map_entries(%map -> %arg0 : !llvm.ptr) {
^bb0(%arg0: !llvm.ptr):
...
omp.terminator
}
omp.terminator
}
```
If that code makes this approach crash, then we would have to rethink the approach. We would need to create some dummy values, or somehow detect missing LLVM IR values and create them on demand. I think the former was basically what the early outlining did with the outlined function's arguments, if I remember correctly.
https://github.com/llvm/llvm-project/pull/85239
More information about the Mlir-commits
mailing list