[Mlir-commits] [mlir] [mlir][OpenMP] delayed privatisation for TASK (PR #114785)

Wed Nov 6 01:47:40 PST 2024

================
@@ -1250,6 +1249,79 @@ static LogicalResult allocAndInitializeReductionVars(
   return success();
 }
 
+/// Allocate delayed private variables. Returns the basic block which comes
+/// after all of these allocations. llvm::Value * for each of these private
+/// variables are populated in llvmPrivateVars.
+template <class OP>
+static llvm::Expected<llvm::BasicBlock *>
+allocatePrivateVars(OP opInst, llvm::IRBuilderBase &builder,
+                    LLVM::ModuleTranslation &moduleTranslation,
+                    MutableArrayRef<BlockArgument> privateBlockArgs,
+                    MutableArrayRef<omp::PrivateClauseOp> privateDecls,
+                    llvm::SmallVector<llvm::Value *> &llvmPrivateVars,
+                    const llvm::OpenMPIRBuilder::InsertPointTy &allocaIP) {
+  // Allocate private vars
+  llvm::BranchInst *allocaTerminator =
+      llvm::cast<llvm::BranchInst>(allocaIP.getBlock()->getTerminator());
+  builder.SetInsertPoint(allocaTerminator);
+  assert(allocaTerminator->getNumSuccessors() == 1 &&
+         "This is an unconditional branch created by OpenMPIRBuilder");
+  llvm::BasicBlock *afterAllocas = allocaTerminator->getSuccessor(0);
+
+  // FIXME: Some of the allocation regions do more than just allocating.
+  // They read from their block argument (amongst other non-alloca things).
+  // When OpenMPIRBuilder outlines the parallel region into a different
+  // function it places the loads for live in-values (such as these block
+  // arguments) at the end of the entry block (because the entry block is
+  // assumed to contain only allocas). Therefore, if we put these complicated
+  // alloc blocks in the entry block, these will not dominate the availability
+  // of the live-in values they are using. Fix this by adding a latealloc
+  // block after the entry block to put these in (this also helps to avoid
+  // mixing non-alloca code with allocas).
+  // Alloc regions which do not use the block argument can still be placed in
+  // the entry block (therefore keeping the allocas together).
+  llvm::BasicBlock *privAllocBlock = nullptr;
+  if (!privateBlockArgs.empty())
+    privAllocBlock = splitBB(builder, true, "omp.private.latealloc");
+  for (unsigned i = 0; i < privateBlockArgs.size(); ++i) {
+    Region &allocRegion = privateDecls[i].getAllocRegion();
+
+    // map allocation region block argument
+    llvm::Value *nonPrivateVar =
+        moduleTranslation.lookupValue(opInst.getPrivateVars()[i]);
+    assert(nonPrivateVar);
+    moduleTranslation.mapValue(privateDecls[i].getAllocMoldArg(),
+                               nonPrivateVar);
+
+    // in-place convert the private allocation region
+    SmallVector<llvm::Value *, 1> phis;
+    if (privateDecls[i].getAllocMoldArg().getUses().empty()) {
+      // TODO this should use
+      // allocaIP.getBlock()->getFirstNonPHIOrDbgOrAlloca() so it goes before
+      // the code for fetching the thread id. Not doing this for now to avoid
+      // test churn.
----------------
NimishMishra wrote:

I may have missed something obvious here, but what is the exact issue here?

https://github.com/llvm/llvm-project/pull/114785