[llvm-branch-commits] [flang] [mlir] [mlir][OpenMP][flang] make private variable allocation implicit in omp.private (PR #124019)

Tom Eccles via llvm-branch-commits llvm-branch-commits at lists.llvm.org
Thu Jan 23 07:20:32 PST 2025


https://github.com/tblah updated https://github.com/llvm/llvm-project/pull/124019

>From 9d963b7be234331d6d7e3ec79be518ea7a90ba88 Mon Sep 17 00:00:00 2001
From: Tom Eccles <tom.eccles at arm.com>
Date: Thu, 28 Nov 2024 15:51:24 +0000
Subject: [PATCH 01/12] [mlir][OpenMP] Change the definition of omp.private

The intention of this work is to give MLIR->LLVMIR conversion freedom to
control how the private variable is allocated so that it can be
allocated on the stack in ordinary cases or as part of a structure used
to give closure context for tasks which might outlive the current stack
frame. See RFC:
https://discourse.llvm.org/t/rfc-openmp-supporting-delayed-task-execution-with-firstprivate-variables/83084

In flang, before this patch we hit a TODO with the same wording when
generating the copy region for firstprivate polymorphic variables. After
this patch the box-like fir.class is passed by reference into the copy
region, leading to a different path that didn't hit that old TODO but
the generated code still didn't work so I added a new TODO in
DataSharingProcessor.

---

Please read mlir changes first and then flang changes.

In flang lowering I box up all arrays and pass the boxes by reference so
that the existing code for reduction init and dealloc regions can be
shared.

The TODOs for pointers, derived types, characters etc will be resolved
in later patches in this same series (to be squashed into this one). I
separated it to make it easier to review.

Assumed rank was already broken before this patch.
I can't find any mention of "assumed rank" in the openmp standard so
I guess it is not prohibited.

Other than the omp.private operation definition changes, the test changes
are mostly down to slightly different codegen from re-using the reduction
init region. That code is already well tested so I didn't want to change it.
---
 .../lib/Lower/OpenMP/DataSharingProcessor.cpp | 121 +++++------
 flang/lib/Optimizer/CodeGen/CodeGenOpenMP.cpp |  36 ++++
 ...lias-analysis-omp-private-allocatable.mlir |  10 +-
 ...ysis-omp-teams-distribute-private-ptr.mlir |  14 +-
 ...analysis-omp-teams-distribute-private.mlir |  24 +--
 flang/test/Fir/boxproc-openmp.fir             |  35 +---
 flang/test/HLFIR/opt-variable-assign-omp.fir  |   9 +-
 .../parallel-private-reduction-worstcase.f90  | 188 +++++++-----------
 .../Integration/OpenMP/private-global.f90     |  18 +-
 .../distribute-standalone-private.f90         |   4 +-
 .../target-private-allocatable.f90            |  26 +--
 .../target-private-simple.f90                 |   7 +-
 .../OpenMP/DelayedPrivatization/wsloop.f90    |   4 +-
 .../OpenMP/cfg-conversion-omp.private.f90     |  25 +--
 ...elayed-privatization-allocatable-array.f90 |  33 ++-
 ...privatization-allocatable-firstprivate.f90 |   4 +-
 ...ayed-privatization-allocatable-private.f90 |  27 +--
 .../OpenMP/delayed-privatization-array.f90    |  86 +++++---
 .../delayed-privatization-character-array.f90 |  33 +--
 .../delayed-privatization-firstprivate.f90    |   7 +-
 ...yed-privatization-private-firstprivate.f90 |   6 +-
 .../OpenMP/delayed-privatization-private.f90  |   8 +-
 .../delayed-privatization-reduction-byref.f90 |   2 +-
 .../delayed-privatization-reduction.f90       |   2 +-
 flang/test/Lower/OpenMP/implicit-dsa.f90      |  85 ++++----
 flang/test/Lower/OpenMP/loop-directive.f90    |   4 +-
 .../parallel-firstprivate-clause-scalar.f90   |  20 +-
 .../OpenMP/same_var_first_lastprivate.f90     |   8 +-
 flang/test/Lower/OpenMP/simd.f90              |   2 +-
 flang/test/Lower/OpenMP/task2.f90             |   2 +-
 .../Transforms/generic-loop-rewriting.mlir    |   5 +-
 .../omp-maps-for-privatized-symbols.fir       |  12 +-
 mlir/include/mlir/Dialect/OpenMP/OpenMPOps.td |  75 ++++---
 .../Conversion/OpenMPToLLVM/OpenMPToLLVM.cpp  |  18 +-
 mlir/lib/Dialect/OpenMP/IR/OpenMPDialect.cpp  |  26 ++-
 .../OpenMP/OpenMPToLLVMIRTranslation.cpp      | 102 +++++-----
 .../OpenMPToLLVM/convert-to-llvmir.mlir       |  26 +--
 mlir/test/Dialect/OpenMP/invalid.mlir         |  63 +++---
 mlir/test/Dialect/OpenMP/ops.mlir             |  27 ++-
 .../Target/LLVMIR/openmp-firstprivate.mlir    |  51 ++---
 mlir/test/Target/LLVMIR/openmp-llvm.mlir      |  15 +-
 .../LLVMIR/openmp-omp.private-dealloc.mlir    |   7 +-
 mlir/test/Target/LLVMIR/openmp-private.mlir   |  75 +++----
 .../Target/LLVMIR/openmp-simd-private.mlir    |  20 +-
 .../openmp-target-multiple-private.mlir       |  15 +-
 .../openmp-target-private-allocatable.mlir    |   8 +-
 .../Target/LLVMIR/openmp-target-private.mlir  |  41 ++--
 .../LLVMIR/openmp-target-simd-on_device.mlir  |   4 +-
 mlir/test/Target/LLVMIR/openmp-todo.mlir      |  41 ++--
 .../LLVMIR/openmp-wsloop-private-cond_br.mlir |  13 +-
 .../Target/LLVMIR/openmp-wsloop-private.mlir  |  18 +-
 51 files changed, 659 insertions(+), 853 deletions(-)

diff --git a/flang/lib/Lower/OpenMP/DataSharingProcessor.cpp b/flang/lib/Lower/OpenMP/DataSharingProcessor.cpp
index 5b89816850beda..44ec6b798c7c0d 100644
--- a/flang/lib/Lower/OpenMP/DataSharingProcessor.cpp
+++ b/flang/lib/Lower/OpenMP/DataSharingProcessor.cpp
@@ -12,6 +12,7 @@
 
 #include "DataSharingProcessor.h"
 
+#include "PrivateReductionUtils.h"
 #include "Utils.h"
 #include "flang/Lower/ConvertVariable.h"
 #include "flang/Lower/PFTBuilder.h"
@@ -19,6 +20,7 @@
 #include "flang/Optimizer/Builder/HLFIRTools.h"
 #include "flang/Optimizer/Builder/Todo.h"
 #include "flang/Optimizer/HLFIR/HLFIROps.h"
+#include "flang/Semantics/attr.h"
 #include "flang/Semantics/tools.h"
 
 namespace Fortran {
@@ -85,28 +87,8 @@ void DataSharingProcessor::insertDeallocs() {
         converter.createHostAssociateVarCloneDealloc(*sym);
         continue;
       }
-
-      lower::SymbolBox hsb = converter.lookupOneLevelUpSymbol(*sym);
-      assert(hsb && "Host symbol box not found");
-      mlir::Type symType = hsb.getAddr().getType();
-      mlir::Location symLoc = hsb.getAddr().getLoc();
-      fir::ExtendedValue symExV = converter.getSymbolExtendedValue(*sym);
-      mlir::omp::PrivateClauseOp privatizer = symToPrivatizer.at(sym);
-
-      lower::SymMapScope scope(symTable);
-      mlir::OpBuilder::InsertionGuard guard(firOpBuilder);
-
-      mlir::Region &deallocRegion = privatizer.getDeallocRegion();
-      fir::FirOpBuilder &firOpBuilder = converter.getFirOpBuilder();
-      mlir::Block *deallocEntryBlock = firOpBuilder.createBlock(
-          &deallocRegion, /*insertPt=*/{}, symType, symLoc);
-
-      firOpBuilder.setInsertionPointToEnd(deallocEntryBlock);
-      symTable.addSymbol(*sym,
-                         fir::substBase(symExV, deallocRegion.getArgument(0)));
-
-      converter.createHostAssociateVarCloneDealloc(*sym);
-      firOpBuilder.create<mlir::omp::YieldOp>(hsb.getAddr().getLoc());
+      // For delayed privatization deallocs are created by
+      // populateByRefInitAndCleanupRegions
     }
 }
 
@@ -468,15 +450,47 @@ void DataSharingProcessor::doPrivatize(const semantics::Symbol *sym,
   lower::SymbolBox hsb = converter.lookupOneLevelUpSymbol(*sym);
   assert(hsb && "Host symbol box not found");
 
-  mlir::Type symType = hsb.getAddr().getType();
+  mlir::Value privVal = hsb.getAddr();
+  mlir::Type allocType = fir::unwrapRefType(privVal.getType());
   mlir::Location symLoc = hsb.getAddr().getLoc();
   std::string privatizerName = sym->name().ToString() + ".privatizer";
   bool isFirstPrivate = sym->test(semantics::Symbol::Flag::OmpFirstPrivate);
 
+  if (mlir::isa<fir::PointerType>(hsb.getAddr().getType()))
+    TODO(symLoc, "Privatization of pointers");
+
+  if (auto poly = mlir::dyn_cast<fir::ClassType>(allocType)) {
+    if (!mlir::isa<fir::PointerType>(poly.getEleTy()) && isFirstPrivate)
+      TODO(symLoc, "create polymorphic host associated copy");
+  }
+
+  // fir.array<> cannot be converted to any single llvm type and fir helpers
+  // are not available in openmp to llvmir translation so we cannot generate
+  // an alloca for a fir.array type there. Get around this by boxing all
+  // arrays.
+  if (mlir::isa<fir::SequenceType>(allocType)) {
+    hlfir::Entity entity{hsb.getAddr()};
+    entity = genVariableBox(symLoc, firOpBuilder, entity);
+    privVal = entity.getBase();
+    allocType = privVal.getType();
+  }
+
+  if (mlir::isa<fir::BaseBoxType>(privVal.getType())) {
+    // Boxes should be passed by reference into nested regions:
+    auto oldIP = firOpBuilder.saveInsertionPoint();
+    firOpBuilder.setInsertionPointToStart(firOpBuilder.getAllocaBlock());
+    auto alloca = firOpBuilder.create<fir::AllocaOp>(symLoc, privVal.getType());
+    firOpBuilder.restoreInsertionPoint(oldIP);
+    firOpBuilder.create<fir::StoreOp>(symLoc, privVal, alloca);
+    privVal = alloca;
+  }
+
+  mlir::Type argType = privVal.getType();
+
   mlir::omp::PrivateClauseOp privatizerOp = [&]() {
     auto moduleOp = firOpBuilder.getModule();
     auto uniquePrivatizerName = fir::getTypeAsString(
-        symType, converter.getKindMap(),
+        allocType, converter.getKindMap(),
         converter.mangleName(*sym) +
             (isFirstPrivate ? "_firstprivate" : "_private"));
 
@@ -488,44 +502,33 @@ void DataSharingProcessor::doPrivatize(const semantics::Symbol *sym,
     mlir::OpBuilder::InsertionGuard guard(firOpBuilder);
     firOpBuilder.setInsertionPointToStart(moduleOp.getBody());
     auto result = firOpBuilder.create<mlir::omp::PrivateClauseOp>(
-        symLoc, uniquePrivatizerName, symType,
+        symLoc, uniquePrivatizerName, allocType,
         isFirstPrivate ? mlir::omp::DataSharingClauseType::FirstPrivate
                        : mlir::omp::DataSharingClauseType::Private);
     fir::ExtendedValue symExV = converter.getSymbolExtendedValue(*sym);
     lower::SymMapScope outerScope(symTable);
 
-    // Populate the `alloc` region.
-    {
-      mlir::Region &allocRegion = result.getAllocRegion();
-      mlir::Block *allocEntryBlock = firOpBuilder.createBlock(
-          &allocRegion, /*insertPt=*/{}, symType, symLoc);
-
-      firOpBuilder.setInsertionPointToEnd(allocEntryBlock);
-
-      fir::ExtendedValue localExV =
-          hlfir::translateToExtendedValue(
-              symLoc, firOpBuilder, hlfir::Entity{allocRegion.getArgument(0)},
-              /*contiguousHint=*/
-              evaluate::IsSimplyContiguous(*sym, converter.getFoldingContext()))
-              .first;
-
-      symTable.addSymbol(*sym, localExV);
-      lower::SymMapScope innerScope(symTable);
-      cloneSymbol(sym);
-      mlir::Value cloneAddr = symTable.shallowLookupSymbol(*sym).getAddr();
-      mlir::Type cloneType = cloneAddr.getType();
-
-      // A `convert` op is required for variables that are storage associated
-      // via `equivalence`. The problem is that these variables are declared as
-      // `fir.ptr`s while their privatized storage is declared as `fir.ref`,
-      // therefore we convert to proper symbol type.
-      mlir::Value yieldedValue =
-          (symType == cloneType) ? cloneAddr
-                                 : firOpBuilder.createConvert(
-                                       cloneAddr.getLoc(), symType, cloneAddr);
-
-      firOpBuilder.create<mlir::omp::YieldOp>(hsb.getAddr().getLoc(),
-                                              yieldedValue);
+    // Populate the `init` region.
+    const bool needsInitialization =
+        !fir::isa_trivial(allocType) ||
+        Fortran::lower::hasDefaultInitialization(sym->GetUltimate();
+    if (needsInitialization) {
+      mlir::Region &initRegion = result.getInitRegion();
+      mlir::Block *initBlock = firOpBuilder.createBlock(
+          &initRegion, /*insertPt=*/{}, {argType, argType}, {symLoc, symLoc});
+
+      if (fir::isa_char(allocType))
+        TODO(symLoc, "Privatization init of characters");
+      if (fir::isa_derived(allocType))
+        TODO(symLoc, "Privatization init of derived types");
+      if (Fortran::lower::hasDefaultInitialization(sym->GetUltimate()))
+        TODO(symLoc,
+             "Privatization init of symbol with default initialization");
+
+      populateByRefInitAndCleanupRegions(
+          firOpBuilder, symLoc, argType, /*scalarInitValue=*/nullptr, initBlock,
+          result.getInitPrivateArg(), result.getInitMoldArg(),
+          result.getDeallocRegion());
     }
 
     // Populate the `copy` region if this is a `firstprivate`.
@@ -534,7 +537,7 @@ void DataSharingProcessor::doPrivatize(const semantics::Symbol *sym,
       // First block argument corresponding to the original/host value while
       // second block argument corresponding to the privatized value.
       mlir::Block *copyEntryBlock = firOpBuilder.createBlock(
-          &copyRegion, /*insertPt=*/{}, {symType, symType}, {symLoc, symLoc});
+          &copyRegion, /*insertPt=*/{}, {argType, argType}, {symLoc, symLoc});
       firOpBuilder.setInsertionPointToEnd(copyEntryBlock);
 
       auto addSymbol = [&](unsigned argIdx, bool force = false) {
@@ -565,7 +568,7 @@ void DataSharingProcessor::doPrivatize(const semantics::Symbol *sym,
 
   if (clauseOps) {
     clauseOps->privateSyms.push_back(mlir::SymbolRefAttr::get(privatizerOp));
-    clauseOps->privateVars.push_back(hsb.getAddr());
+    clauseOps->privateVars.push_back(privVal);
   }
 
   symToPrivatizer[sym] = privatizerOp;
diff --git a/flang/lib/Optimizer/CodeGen/CodeGenOpenMP.cpp b/flang/lib/Optimizer/CodeGen/CodeGenOpenMP.cpp
index da13ed648e44e6..37f1c9f97e1ce2 100644
--- a/flang/lib/Optimizer/CodeGen/CodeGenOpenMP.cpp
+++ b/flang/lib/Optimizer/CodeGen/CodeGenOpenMP.cpp
@@ -90,9 +90,45 @@ struct MapInfoOpConversion
     return mlir::success();
   }
 };
+
+// FIR op specific conversion for PrivateClauseOp that overwrites the default
+// OpenMP Dialect lowering, this allows FIR-aware lowering of types, required
+// for boxes because the OpenMP dialect conversion doesn't know anything about
+// FIR types.
+struct PrivateClauseOpConversion
+    : public OpenMPFIROpConversion<mlir::omp::PrivateClauseOp> {
+  using OpenMPFIROpConversion::OpenMPFIROpConversion;
+
+  llvm::LogicalResult
+  matchAndRewrite(mlir::omp::PrivateClauseOp curOp, OpAdaptor adaptor,
+                  mlir::ConversionPatternRewriter &rewriter) const override {
+    const fir::LLVMTypeConverter &converter = lowerTy();
+    mlir::Type convertedAllocType;
+    if (auto box = mlir::dyn_cast<fir::BaseBoxType>(curOp.getType())) {
+      // In LLVM codegen fir.box<> == fir.ref<fir.box<>> == llvm.ptr
+      // Here we really do want the actual structure
+      if (box.isAssumedRank())
+        TODO(curOp->getLoc(), "Privatize an assumed rank array");
+      unsigned rank = 0;
+      if (auto seqTy = mlir::dyn_cast<fir::SequenceType>(
+              fir::unwrapRefType(box.getEleTy())))
+        rank = seqTy.getShape().size();
+      convertedAllocType = converter.convertBoxTypeAsStruct(box, rank);
+    } else {
+      convertedAllocType = converter.convertType(adaptor.getType());
+    }
+    if (!convertedAllocType)
+      return mlir::failure();
+    rewriter.startOpModification(curOp);
+    curOp.setType(convertedAllocType);
+    rewriter.finalizeOpModification(curOp);
+    return mlir::success();
+  }
+};
 } // namespace
 
 void fir::populateOpenMPFIRToLLVMConversionPatterns(
     const LLVMTypeConverter &converter, mlir::RewritePatternSet &patterns) {
   patterns.add<MapInfoOpConversion>(converter);
+  patterns.add<PrivateClauseOpConversion>(converter);
 }
diff --git a/flang/test/Analysis/AliasAnalysis/alias-analysis-omp-private-allocatable.mlir b/flang/test/Analysis/AliasAnalysis/alias-analysis-omp-private-allocatable.mlir
index 5116622364fad8..e19885c71a9f87 100644
--- a/flang/test/Analysis/AliasAnalysis/alias-analysis-omp-private-allocatable.mlir
+++ b/flang/test/Analysis/AliasAnalysis/alias-analysis-omp-private-allocatable.mlir
@@ -20,15 +20,7 @@
 // CHECK: ar2#0 <-> ar1#1: NoAlias
 // CHECK: ar2#1 <-> ar1#1: NoAlias
 
-omp.private {type = private} @_QFmysubEar1_private_ref_box_heap_Uxf64 : !fir.ref<!fir.box<!fir.heap<!fir.array<?xf64>>>> alloc {
-^bb0(%arg0: !fir.ref<!fir.box<!fir.heap<!fir.array<?xf64>>>>):
-  %0 = fir.alloca !fir.box<!fir.heap<!fir.array<?xf64>>> {bindc_name = "ar1", pinned, uniq_name = "_QFmysubEar1"}
-  %5:2 = hlfir.declare %0 {fortran_attrs = #fir.var_attrs<allocatable>, uniq_name = "_QFmysubEar1"} : (!fir.ref<!fir.box<!fir.heap<!fir.array<?xf64>>>>) -> (!fir.ref<!fir.box<!fir.heap<!fir.array<?xf64>>>>, !fir.ref<!fir.box<!fir.heap<!fir.array<?xf64>>>>)
-  omp.yield(%5#0 : !fir.ref<!fir.box<!fir.heap<!fir.array<?xf64>>>>)
-} dealloc {
-^bb0(%arg0: !fir.ref<!fir.box<!fir.heap<!fir.array<?xf64>>>>):
-  omp.yield
-}
+omp.private {type = private} @_QFmysubEar1_private_ref_box_heap_Uxf64 : !fir.box<!fir.heap<!fir.array<?xf64>>>
 func.func @testPrivateAllocatable(%arg0: !fir.ref<i32> {fir.bindc_name = "ns"}, %arg1: !fir.ref<i32> {fir.bindc_name = "ne"}) {
   %0 = fir.dummy_scope : !fir.dscope
   %1 = fir.alloca !fir.box<!fir.heap<!fir.array<?xf64>>> {bindc_name = "ar1", uniq_name = "_QFmysubEar1"}
diff --git a/flang/test/Analysis/AliasAnalysis/alias-analysis-omp-teams-distribute-private-ptr.mlir b/flang/test/Analysis/AliasAnalysis/alias-analysis-omp-teams-distribute-private-ptr.mlir
index 78207d21c45bf3..b60fbe4152fc12 100644
--- a/flang/test/Analysis/AliasAnalysis/alias-analysis-omp-teams-distribute-private-ptr.mlir
+++ b/flang/test/Analysis/AliasAnalysis/alias-analysis-omp-teams-distribute-private-ptr.mlir
@@ -17,18 +17,8 @@
 // CHECK-LABEL: Testing : "_QQmain"
 // CHECK-DAG:   ptrA#0 <-> ArrayA#0: MayAlias
 
-omp.private {type = private} @_QFEi_private_ref_i32 : !fir.ref<i32> alloc {
-^bb0(%arg0: !fir.ref<i32>):
-  %0 = fir.alloca i32 {bindc_name = "i", pinned, uniq_name = "_QFEi"}
-  %1:2 = hlfir.declare %0 {uniq_name = "_QFEi"} : (!fir.ref<i32>) -> (!fir.ref<i32>, !fir.ref<i32>)
-  omp.yield(%1#0 : !fir.ref<i32>)
-}
-omp.private {type = firstprivate} @_QFEptra_firstprivate_ref_box_ptr_Uxi32 : !fir.ref<!fir.box<!fir.ptr<!fir.array<?xi32>>>> alloc {
-^bb0(%arg0: !fir.ref<!fir.box<!fir.ptr<!fir.array<?xi32>>>>):
-  %0 = fir.alloca !fir.box<!fir.ptr<!fir.array<?xi32>>> {bindc_name = "ptra", pinned, uniq_name = "_QFEptra"}
-  %1:2 = hlfir.declare %0 {fortran_attrs = #fir.var_attrs<pointer>, uniq_name = "_QFEptra"} : (!fir.ref<!fir.box<!fir.ptr<!fir.array<?xi32>>>>) -> (!fir.ref<!fir.box<!fir.ptr<!fir.array<?xi32>>>>, !fir.ref<!fir.box<!fir.ptr<!fir.array<?xi32>>>>)
-  omp.yield(%1#0 : !fir.ref<!fir.box<!fir.ptr<!fir.array<?xi32>>>>)
-} copy {
+omp.private {type = private} @_QFEi_private_ref_i32 : i32
+omp.private {type = firstprivate} @_QFEptra_firstprivate_ref_box_ptr_Uxi32 : !fir.box<!fir.ptr<!fir.array<?xi32>>> copy {
 ^bb0(%arg0: !fir.ref<!fir.box<!fir.ptr<!fir.array<?xi32>>>>, %arg1: !fir.ref<!fir.box<!fir.ptr<!fir.array<?xi32>>>>):
   %0 = fir.load %arg0 : !fir.ref<!fir.box<!fir.ptr<!fir.array<?xi32>>>>
   fir.store %0 to %arg1 : !fir.ref<!fir.box<!fir.ptr<!fir.array<?xi32>>>>
diff --git a/flang/test/Analysis/AliasAnalysis/alias-analysis-omp-teams-distribute-private.mlir b/flang/test/Analysis/AliasAnalysis/alias-analysis-omp-teams-distribute-private.mlir
index 4668b2c215c8c3..7f60a6fa0803a8 100644
--- a/flang/test/Analysis/AliasAnalysis/alias-analysis-omp-teams-distribute-private.mlir
+++ b/flang/test/Analysis/AliasAnalysis/alias-analysis-omp-teams-distribute-private.mlir
@@ -21,26 +21,10 @@
 // CHECK-DAG: tmp_private_array#0 <-> unnamed_array#0: NoAlias
 // CHECK-DAG: tmp_private_array#1 <-> unnamed_array#0: NoAlias
 
-omp.private {type = private} @_QFEi_private_ref_i32 : !fir.ref<i32> alloc {
-^bb0(%arg0: !fir.ref<i32>):
-  %0 = fir.alloca i32 {bindc_name = "i", pinned, uniq_name = "_QFEi"}
-  %1:2 = hlfir.declare %0 {uniq_name = "_QFEi"} : (!fir.ref<i32>) -> (!fir.ref<i32>, !fir.ref<i32>)
-  omp.yield(%1#0 : !fir.ref<i32>)
-}
-omp.private {type = private} @_QFEj_private_ref_i32 : !fir.ref<i32> alloc {
-^bb0(%arg0: !fir.ref<i32>):
-  %0 = fir.alloca i32 {bindc_name = "j", pinned, uniq_name = "_QFEj"}
-  %1:2 = hlfir.declare %0 {uniq_name = "_QFEj"} : (!fir.ref<i32>) -> (!fir.ref<i32>, !fir.ref<i32>)
-  omp.yield(%1#0 : !fir.ref<i32>)
-}
-omp.private {type = private} @_QFEtmp_private_ref_2xi32 : !fir.ref<!fir.array<2xi32>> alloc {
-^bb0(%arg0: !fir.ref<!fir.array<2xi32>>):
-  %c2 = arith.constant 2 : index
-  %0 = fir.alloca !fir.array<2xi32> {bindc_name = "tmp", pinned, uniq_name = "_QFEtmp"}
-  %1 = fir.shape %c2 : (index) -> !fir.shape<1>
-  %2:2 = hlfir.declare %0(%1) {uniq_name = "_QFEtmp"} : (!fir.ref<!fir.array<2xi32>>, !fir.shape<1>) -> (!fir.ref<!fir.array<2xi32>>, !fir.ref<!fir.array<2xi32>>)
-  omp.yield(%2#0 : !fir.ref<!fir.array<2xi32>>)
-}
+omp.private {type = private} @_QFEi_private_ref_i32 : i32
+omp.private {type = private} @_QFEj_private_ref_i32 : i32
+omp.private {type = private} @_QFEtmp_private_ref_2xi32 : !fir.array<2xi32>
+
 func.func @_QQmain() attributes {fir.bindc_name = "main"} {
   %0 = fir.address_of(@_QFEarraya) : !fir.ref<!fir.array<10x10xi32>>
   %c10 = arith.constant 10 : index
diff --git a/flang/test/Fir/boxproc-openmp.fir b/flang/test/Fir/boxproc-openmp.fir
index 9db053ad93c665..4f62b0a4a42b25 100644
--- a/flang/test/Fir/boxproc-openmp.fir
+++ b/flang/test/Fir/boxproc-openmp.fir
@@ -3,26 +3,13 @@
 // Check minimally, only arguments, yields and the private types.
 
 // Test a private declaration with one region (alloc)
-//CHECK: omp.private {type = private}  @_QFsub1Et1_private_ref_rec__QFsub1Tt : !fir.ref<!fir.type<_QFsub1TtUnboxProc{p1:() -> ()}>> alloc {
-omp.private {type = private} @_QFsub1Et1_private_ref_rec__QFsub1Tt : !fir.ref<!fir.type<_QFsub1Tt{p1:!fir.boxproc<() -> ()>}>> alloc {
-//CHECK: ^bb0(%{{.*}}: !fir.ref<!fir.type<_QFsub1TtUnboxProc{p1:() -> ()}>>):
-^bb0(%arg0: !fir.ref<!fir.type<_QFsub1Tt{p1:!fir.boxproc<() -> ()>}>>):
-  %c1_i32 = arith.constant 1 : i32
-  %0 = fir.alloca !fir.type<_QFsub1Tt{p1:!fir.boxproc<() -> ()>}> {bindc_name = "t1", pinned, uniq_name = "_QFsub1Et1"}
-  %1 = fir.declare %0 {uniq_name = "_QFsub1Et1"} : (!fir.ref<!fir.type<_QFsub1Tt{p1:!fir.boxproc<() -> ()>}>>) -> !fir.ref<!fir.type<_QFsub1Tt{p1:!fir.boxproc<() -> ()>}>>
-  %2 = fir.embox %1 : (!fir.ref<!fir.type<_QFsub1Tt{p1:!fir.boxproc<() -> ()>}>>) -> !fir.box<!fir.type<_QFsub1Tt{p1:!fir.boxproc<() -> ()>}>>
-  %3 = fir.address_of(@_QQclXea6256ba131ddd9c2210e68030a0edd3) : !fir.ref<!fir.char<1,49>>
-  %4 = fir.convert %2 : (!fir.box<!fir.type<_QFsub1Tt{p1:!fir.boxproc<() -> ()>}>>) -> !fir.box<none>
-  %5 = fir.convert %3 : (!fir.ref<!fir.char<1,49>>) -> !fir.ref<i8>
-  fir.call @_FortranAInitialize(%4, %5, %c1_i32) fastmath<contract> : (!fir.box<none>, !fir.ref<i8>, i32) -> ()
-//CHECK: omp.yield(%{{.*}} : !fir.ref<!fir.type<_QFsub1TtUnboxProc{p1:() -> ()}>>)
-  omp.yield(%1 : !fir.ref<!fir.type<_QFsub1Tt{p1:!fir.boxproc<() -> ()>}>>)
-}
+//CHECK: omp.private {type = private}  @_QFsub1Et1_private_rec__QFsub1Tt : !fir.type<_QFsub1TtUnboxProc{p1:() -> ()}>{{$}}
+omp.private {type = private} @_QFsub1Et1_private_rec__QFsub1Tt : !fir.type<_QFsub1Tt{p1:!fir.boxproc<() -> ()>}>
 func.func @_QPsub1() {
   %0 = fir.alloca !fir.type<_QFsub1Tt{p1:!fir.boxproc<() -> ()>}> {bindc_name = "t1", uniq_name = "_QFsub1Et1"}
   %1 = fir.declare %0 {uniq_name = "_QFsub1Et1"} : (!fir.ref<!fir.type<_QFsub1Tt{p1:!fir.boxproc<() -> ()>}>>) -> !fir.ref<!fir.type<_QFsub1Tt{p1:!fir.boxproc<() -> ()>}>>
-//CHECK: omp.parallel private(@_QFsub1Et1_private_ref_rec__QFsub1Tt %{{.*}} -> %{{.*}} : !fir.ref<!fir.type<_QFsub1TtUnboxProc{p1:() -> ()}>>) {
-  omp.parallel private(@_QFsub1Et1_private_ref_rec__QFsub1Tt %1 -> %arg0 : !fir.ref<!fir.type<_QFsub1Tt{p1:!fir.boxproc<() -> ()>}>>) {
+//CHECK: omp.parallel private(@_QFsub1Et1_private_rec__QFsub1Tt %{{.*}} -> %{{.*}} : !fir.ref<!fir.type<_QFsub1TtUnboxProc{p1:() -> ()}>>) {
+  omp.parallel private(@_QFsub1Et1_private_rec__QFsub1Tt %1 -> %arg0 : !fir.ref<!fir.type<_QFsub1Tt{p1:!fir.boxproc<() -> ()>}>>) {
     %2 = fir.declare %arg0 {uniq_name = "_QFsub1Et1"} : (!fir.ref<!fir.type<_QFsub1Tt{p1:!fir.boxproc<() -> ()>}>>) -> !fir.ref<!fir.type<_QFsub1Tt{p1:!fir.boxproc<() -> ()>}>>
     omp.terminator
   }
@@ -31,11 +18,11 @@ func.func @_QPsub1() {
 
 
 // Test a private declaration with all regions (alloc, copy, dealloc)
-//CHECK: omp.private {type = firstprivate} @_QFsub2Et1_firstprivate_ref_box_heap_rec__QFsub2Tt : 
-//CHECK-SAME: !fir.ref<!fir.box<!fir.heap<!fir.type<_QFsub2TtUnboxProc{p1:() -> ()}>>>> alloc {
-omp.private {type = firstprivate} @_QFsub2Et1_firstprivate_ref_box_heap_rec__QFsub2Tt : !fir.ref<!fir.box<!fir.heap<!fir.type<_QFsub2Tt{p1:!fir.boxproc<() -> ()>}>>>> alloc {
-//CHECK: ^bb0(%{{.*}}: !fir.ref<!fir.box<!fir.heap<!fir.type<_QFsub2TtUnboxProc{p1:() -> ()}>>>>):
-^bb0(%arg0: !fir.ref<!fir.box<!fir.heap<!fir.type<_QFsub2Tt{p1:!fir.boxproc<() -> ()>}>>>>):
+//CHECK: omp.private {type = firstprivate} @_QFsub2Et1_firstprivate_box_heap_rec__QFsub2Tt :
+//CHECK-SAME: [[TYPE:!fir.box<!fir.heap<!fir.type<_QFsub2TtUnboxProc\{p1:\(\) -> \(\)\}>>>]] init {
+omp.private {type = firstprivate} @_QFsub2Et1_firstprivate_box_heap_rec__QFsub2Tt : !fir.box<!fir.heap<!fir.type<_QFsub2Tt{p1:!fir.boxproc<() -> ()>}>>> init {
+//CHECK: ^bb0(%{{.*}}: !fir.ref<[[TYPE]]>, %{{.*}}: !fir.ref<[[TYPE]]>):
+^bb0(%arg0: !fir.ref<!fir.box<!fir.heap<!fir.type<_QFsub2Tt{p1:!fir.boxproc<() -> ()>}>>>>, %arg1:!fir.ref<!fir.box<!fir.heap<!fir.type<_QFsub2Tt{p1:!fir.boxproc<() -> ()>}>>>>):
   %0 = fir.alloca !fir.box<!fir.heap<!fir.type<_QFsub2Tt{p1:!fir.boxproc<() -> ()>}>>> {bindc_name = "t1", pinned, uniq_name = "_QFsub2Et1"}
   %1 = fir.declare %0 {fortran_attrs = #fir.var_attrs<allocatable>, uniq_name = "_QFsub2Et1"} : (!fir.ref<!fir.box<!fir.heap<!fir.type<_QFsub2Tt{p1:!fir.boxproc<() -> ()>}>>>>) -> !fir.ref<!fir.box<!fir.heap<!fir.type<_QFsub2Tt{p1:!fir.boxproc<() -> ()>}>>>>
 //CHECK: omp.yield(%{{.*}} : !fir.ref<!fir.box<!fir.heap<!fir.type<_QFsub2TtUnboxProc{p1:() -> ()}>>>>)
@@ -70,9 +57,9 @@ omp.private {type = firstprivate} @_QFsub2Et1_firstprivate_ref_box_heap_rec__QFs
 func.func @_QPsub2() {
   %0 = fir.alloca !fir.box<!fir.heap<!fir.type<_QFsub2Tt{p1:!fir.boxproc<() -> ()>}>>> {bindc_name = "t1", uniq_name = "_QFsub2Et1"}
   %1 = fir.declare %0 {fortran_attrs = #fir.var_attrs<allocatable>, uniq_name = "_QFsub2Et1"} : (!fir.ref<!fir.box<!fir.heap<!fir.type<_QFsub2Tt{p1:!fir.boxproc<() -> ()>}>>>>) -> !fir.ref<!fir.box<!fir.heap<!fir.type<_QFsub2Tt{p1:!fir.boxproc<() -> ()>}>>>>
-//CHECK: omp.parallel private(@_QFsub2Et1_firstprivate_ref_box_heap_rec__QFsub2Tt %{{.*}} -> %{{.*}} :
+//CHECK: omp.parallel private(@_QFsub2Et1_firstprivate_box_heap_rec__QFsub2Tt %{{.*}} -> %{{.*}} :
 //CHECK-SAME: !fir.ref<!fir.box<!fir.heap<!fir.type<_QFsub2TtUnboxProc{p1:() -> ()}>>>>) {
-  omp.parallel private(@_QFsub2Et1_firstprivate_ref_box_heap_rec__QFsub2Tt %1 -> %arg0 : !fir.ref<!fir.box<!fir.heap<!fir.type<_QFsub2Tt{p1:!fir.boxproc<() -> ()>}>>>>) {
+  omp.parallel private(@_QFsub2Et1_firstprivate_box_heap_rec__QFsub2Tt %1 -> %arg0 : !fir.ref<!fir.box<!fir.heap<!fir.type<_QFsub2Tt{p1:!fir.boxproc<() -> ()>}>>>>) {
     %2 = fir.declare %arg0 {fortran_attrs = #fir.var_attrs<allocatable>, uniq_name = "_QFsub2Et1"} : (!fir.ref<!fir.box<!fir.heap<!fir.type<_QFsub2Tt{p1:!fir.boxproc<() -> ()>}>>>>) -> !fir.ref<!fir.box<!fir.heap<!fir.type<_QFsub2Tt{p1:!fir.boxproc<() -> ()>}>>>>
     omp.terminator
   }
diff --git a/flang/test/HLFIR/opt-variable-assign-omp.fir b/flang/test/HLFIR/opt-variable-assign-omp.fir
index 10cb2b4408fb86..f3ba53283c7467 100755
--- a/flang/test/HLFIR/opt-variable-assign-omp.fir
+++ b/flang/test/HLFIR/opt-variable-assign-omp.fir
@@ -7,8 +7,8 @@
 // TODO: we can't currently optimize this assign because alias analysis doesn't
 // know that the block arguments of the copy region cannot alias.
 
-omp.private {type = firstprivate} @_QFFbEl_firstprivate_box_Uxi32 : !fir.ref<!fir.box<!fir.array<?xi32>>> alloc {
-^bb0(%arg0: !fir.ref<!fir.box<!fir.array<?xi32>>>):
+omp.private {type = firstprivate} @_QFFbEl_firstprivate_box_Uxi32 : !fir.box<!fir.array<?xi32>> init {
+^bb0(%arg0: !fir.ref<!fir.box<!fir.array<?xi32>>>, %arg1: !fir.ref<!fir.box<!fir.array<?xi32>>>):
   %0 = fir.load %arg0 : !fir.ref<!fir.box<!fir.array<?xi32>>>
   %c0 = arith.constant 0 : index
   %1:3 = fir.box_dims %0, %c0 : (!fir.box<!fir.array<?xi32>>, index) -> (index, index, index)
@@ -20,9 +20,8 @@ omp.private {type = firstprivate} @_QFFbEl_firstprivate_box_Uxi32 : !fir.ref<!fi
   %5:3 = fir.box_dims %0, %c0_0 : (!fir.box<!fir.array<?xi32>>, index) -> (index, index, index)
   %6 = fir.shape_shift %5#0, %5#1 : (index, index) -> !fir.shapeshift<1>
   %7 = fir.rebox %4#0(%6) : (!fir.box<!fir.array<?xi32>>, !fir.shapeshift<1>) -> !fir.box<!fir.array<?xi32>>
-  %8 = fir.alloca !fir.box<!fir.array<?xi32>>
-  fir.store %7 to %8 : !fir.ref<!fir.box<!fir.array<?xi32>>>
-  omp.yield(%8 : !fir.ref<!fir.box<!fir.array<?xi32>>>)
+  fir.store %7 to %arg1 : !fir.ref<!fir.box<!fir.array<?xi32>>>
+  omp.yield(%arg1 : !fir.ref<!fir.box<!fir.array<?xi32>>>)
 } copy {
 ^bb0(%arg0: !fir.ref<!fir.box<!fir.array<?xi32>>>, %arg1 : !fir.ref<!fir.box<!fir.array<?xi32>>>):
   %0 = fir.load %arg0 {test.ptr = "load_from_block_arg"} : !fir.ref<!fir.box<!fir.array<?xi32>>>
diff --git a/flang/test/Integration/OpenMP/parallel-private-reduction-worstcase.f90 b/flang/test/Integration/OpenMP/parallel-private-reduction-worstcase.f90
index 0173847b732359..6facce56123ab2 100644
--- a/flang/test/Integration/OpenMP/parallel-private-reduction-worstcase.f90
+++ b/flang/test/Integration/OpenMP/parallel-private-reduction-worstcase.f90
@@ -32,69 +32,69 @@ subroutine worst_case(a, b, c, d)
 ! CHECK-LABEL: define internal void @worst_case_..omp_par
 ! CHECK-NEXT:  omp.par.entry:
 !                [reduction alloc regions inlined here]
-! CHECK:         br label %omp.private.latealloc
+! CHECK:         br label %omp.private.init
 
-! CHECK:       omp.private.latealloc:                            ; preds = %omp.par.entry
-! CHECK-NEXT:  br label %omp.private.alloc5
+! CHECK:       omp.private.init:                            ; preds = %omp.par.entry
+! CHECK-NEXT:  br label %omp.private.init7
 
-! CHECK:       omp.private.alloc5:                               ; preds = %omp.private.latealloc
+! CHECK:       omp.private.init7:                               ; preds = %omp.private.init
 !                [begin private alloc for first var]
 !                [read the length from the mold argument]
 !                [if it is non-zero...]
-! CHECK:         br i1 {{.*}}, label %omp.private.alloc6, label %omp.private.alloc7
+! CHECK:         br i1 {{.*}}, label %omp.private.init8, label %omp.private.init9
 
-! CHECK:       omp.private.alloc7:                               ; preds = %omp.private.alloc5
+! CHECK:       omp.private.init9:                               ; preds = %omp.private.init7
 !                [finish private alloc for first var with zero extent]
-! CHECK:         br label %omp.private.alloc8
+! CHECK:         br label %omp.private.init10
 
-! CHECK:       omp.private.alloc8:                               ; preds = %omp.private.alloc6, %omp.private.alloc7
-! CHECK-NEXT:    br label %omp.region.cont4
+! CHECK:       omp.private.init10:                               ; preds = %omp.private.init8, %omp.private.init9
+! CHECK-NEXT:    br label %omp.region.cont6
 
-! CHECK:       omp.region.cont4:                                 ; preds = %omp.private.alloc8
+! CHECK:       omp.region.cont6:                                 ; preds = %omp.private.init10
 ! CHECK-NEXT:    %{{.*}} = phi ptr
-! CHECK-NEXT:    br label %omp.private.alloc
+! CHECK-NEXT:    br label %omp.private.init1
 
-! CHECK:       omp.private.alloc:                                ; preds = %omp.region.cont4
+! CHECK:       omp.private.init1:                                ; preds = %omp.region.cont6
 !                [begin private alloc for first var]
 !                [read the length from the mold argument]
 !                [if it is non-zero...]
-! CHECK:         br i1 %{{.*}}, label %omp.private.alloc1, label %omp.private.alloc2
+! CHECK:         br i1 %{{.*}}, label %omp.private.init2, label %omp.private.init3
 
-! CHECK:       omp.private.alloc2:                               ; preds = %omp.private.alloc
+! CHECK:       omp.private.init3:                               ; preds = %omp.private.init1
 !                [finish private alloc for second var with zero extent]
-! CHECK:         br label %omp.private.alloc3
+! CHECK:         br label %omp.private.init4
 
-! CHECK:       omp.private.alloc3:                               ; preds = %omp.private.alloc1, %omp.private.alloc2
+! CHECK:       omp.private.init4:                               ; preds = %omp.private.init2, %omp.private.init3
 ! CHECK-NEXT:    br label %omp.region.cont
 
-! CHECK:       omp.region.cont:                                  ; preds = %omp.private.alloc3
+! CHECK:       omp.region.cont:                                  ; preds = %omp.private.init4
 ! CHECK-NEXT:    %{{.*}} = phi ptr
 ! CHECK-NEXT:    br label %omp.private.copy
 
 ! CHECK:       omp.private.copy:                                 ; preds = %omp.region.cont
-! CHECK-NEXT:    br label %omp.private.copy10
+! CHECK-NEXT:    br label %omp.private.copy12
 
-! CHECK:       omp.private.copy10:                               ; preds = %omp.private.copy
+! CHECK:       omp.private.copy12:                               ; preds = %omp.private.copy
 !                [begin firstprivate copy for first var]
 !                [read the length, is it non-zero?]
-! CHECK:         br i1 %{{.*}}, label %omp.private.copy11, label %omp.private.copy12
+! CHECK:         br i1 %{{.*}}, label %omp.private.copy13, label %omp.private.copy14
 
-! CHECK:       omp.private.copy12:                               ; preds = %omp.private.copy11, %omp.private.copy10
-! CHECK-NEXT:    br label %omp.region.cont9
+! CHECK:       omp.private.copy14:                               ; preds = %omp.private.copy13, %omp.private.copy12
+! CHECK-NEXT:    br label %omp.region.cont11
 
-! CHECK:       omp.region.cont9:                                 ; preds = %omp.private.copy12
+! CHECK:       omp.region.cont11:                                 ; preds = %omp.private.copy14
 ! CHECK-NEXT:    %{{.*}} = phi ptr
-! CHECK-NEXT:    br label %omp.private.copy14
+! CHECK-NEXT:    br label %omp.private.copy16
 
-! CHECK:       omp.private.copy14:                               ; preds = %omp.region.cont9
+! CHECK:       omp.private.copy16:                               ; preds = %omp.region.cont11
 !                [begin firstprivate copy for second var]
 !                [read the length, is it non-zero?]
-! CHECK:         br i1 %{{.*}}, label %omp.private.copy15, label %omp.private.copy16
+! CHECK:         br i1 %{{.*}}, label %omp.private.copy17, label %omp.private.copy18
 
-! CHECK:       omp.private.copy16:                               ; preds = %omp.private.copy15, %omp.private.copy14
-! CHECK-NEXT:    br label %omp.region.cont13
+! CHECK:       omp.private.copy18:                               ; preds = %omp.private.copy17, %omp.private.copy16
+! CHECK-NEXT:    br label %omp.region.cont15
 
-! CHECK:       omp.region.cont13:                                ; preds = %omp.private.copy16
+! CHECK:       omp.region.cont15:                                ; preds = %omp.private.copy18
 ! CHECK-NEXT:    %{{.*}} = phi ptr
 ! CHECK-NEXT:    br label %omp.region.after_alloca
 
@@ -111,44 +111,44 @@ subroutine worst_case(a, b, c, d)
 ! CHECK:       omp.reduction.neutral:                            ; preds = %omp.reduction.init
 !                [start of reduction initialization region]
 !                [null check:]
-! CHECK:         br i1 %{{.*}}, label %omp.reduction.neutral18, label %omp.reduction.neutral19
+! CHECK:         br i1 %{{.*}}, label %omp.reduction.neutral20, label %omp.reduction.neutral21
 
-! CHECK:       omp.reduction.neutral19:                          ; preds = %omp.reduction.neutral
+! CHECK:       omp.reduction.neutral21:                          ; preds = %omp.reduction.neutral
 !                [malloc and assign the default value to the reduction variable]
-! CHECK:         br label %omp.reduction.neutral20
+! CHECK:         br label %omp.reduction.neutral22
 
-! CHECK:       omp.reduction.neutral20:                          ; preds = %omp.reduction.neutral18, %omp.reduction.neutral19
-! CHECK-NEXT:    br label %omp.region.cont17
+! CHECK:       omp.reduction.neutral22:                          ; preds = %omp.reduction.neutral20, %omp.reduction.neutral21
+! CHECK-NEXT:    br label %omp.region.cont19
 
-! CHECK:       omp.region.cont17:                                ; preds = %omp.reduction.neutral20
+! CHECK:       omp.region.cont19:                                ; preds = %omp.reduction.neutral22
 ! CHECK-NEXT:    %{{.*}} = phi ptr
-! CHECK-NEXT:    br label %omp.reduction.neutral22
+! CHECK-NEXT:    br label %omp.reduction.neutral24
 
-! CHECK:       omp.reduction.neutral22:                          ; preds = %omp.region.cont17
+! CHECK:       omp.reduction.neutral24:                          ; preds = %omp.region.cont19
 !                [start of reduction initialization region]
 !                [null check:]
-! CHECK:         br i1 %{{.*}}, label %omp.reduction.neutral23, label %omp.reduction.neutral24
+! CHECK:         br i1 %{{.*}}, label %omp.reduction.neutral25, label %omp.reduction.neutral26
 
-! CHECK:       omp.reduction.neutral24:                          ; preds = %omp.reduction.neutral22
+! CHECK:       omp.reduction.neutral26:                          ; preds = %omp.reduction.neutral24
 !                [malloc and assign the default value to the reduction variable]
-! CHECK:         br label %omp.reduction.neutral25
+! CHECK:         br label %omp.reduction.neutral27
 
-! CHECK:       omp.reduction.neutral25:                          ; preds = %omp.reduction.neutral23, %omp.reduction.neutral24
-! CHECK-NEXT:    br label %omp.region.cont21
+! CHECK:       omp.reduction.neutral27:                          ; preds = %omp.reduction.neutral25, %omp.reduction.neutral26
+! CHECK-NEXT:    br label %omp.region.cont23
 
-! CHECK:       omp.region.cont21:                                ; preds = %omp.reduction.neutral25
+! CHECK:       omp.region.cont23:                                ; preds = %omp.reduction.neutral27
 ! CHECK-NEXT:    %{{.*}} = phi ptr
-! CHECK-NEXT:    br label %omp.par.region27
+! CHECK-NEXT:    br label %omp.par.region29
 
-! CHECK:       omp.par.region27:                                 ; preds = %omp.region.cont21
+! CHECK:       omp.par.region29:                                 ; preds = %omp.region.cont23
 !                [call SUM runtime function]
 !                [if (sum(a) == 1)]
-! CHECK:         br i1 %{{.*}}, label %omp.par.region28, label %omp.par.region29
+! CHECK:         br i1 %{{.*}}, label %omp.par.region30, label %omp.par.region31
 
-! CHECK:       omp.par.region29:                                 ; preds = %omp.par.region27
-! CHECK-NEXT:    br label %omp.region.cont26
+! CHECK:       omp.par.region31:                                 ; preds = %omp.par.region29
+! CHECK-NEXT:    br label %omp.region.cont28
 
-! CHECK:       omp.region.cont26:                                ; preds = %omp.par.region28, %omp.par.region29
+! CHECK:       omp.region.cont28:                                ; preds = %omp.par.region30, %omp.par.region31
 !                [omp parallel region done, call into the runtime to complete reduction]
 ! CHECK:         %[[VAL_233:.*]] = call i32 @__kmpc_reduce(
 ! CHECK:         switch i32 %[[VAL_233]], label %reduce.finalize [
@@ -156,16 +156,16 @@ subroutine worst_case(a, b, c, d)
 ! CHECK-NEXT:      i32 2, label %reduce.switch.atomic
 ! CHECK-NEXT:    ]
 
-! CHECK:       reduce.switch.atomic:                             ; preds = %omp.region.cont26
+! CHECK:       reduce.switch.atomic:                             ; preds = %omp.region.cont28
 ! CHECK-NEXT:    unreachable
 
-! CHECK:       reduce.switch.nonatomic:                          ; preds = %omp.region.cont26
+! CHECK:       reduce.switch.nonatomic:                          ; preds = %omp.region.cont28
 ! CHECK-NEXT:    %[[red_private_value_0:.*]] = load ptr, ptr %{{.*}}, align 8
 ! CHECK-NEXT:    br label %omp.reduction.nonatomic.body
 
 !              [various blocks implementing the reduction]
 
-! CHECK:       omp.region.cont35:                                ; preds =
+! CHECK:       omp.region.cont37:                                ; preds =
 ! CHECK-NEXT:    %{{.*}} = phi ptr
 ! CHECK-NEXT:    call void @__kmpc_end_reduce(
 ! CHECK-NEXT:    br label %reduce.finalize
@@ -179,87 +179,45 @@ subroutine worst_case(a, b, c, d)
 
 ! CHECK:       omp.reduction.cleanup:                            ; preds = %omp.par.pre_finalize
 !                [null check]
-! CHECK:         br i1 %{{.*}}, label %omp.reduction.cleanup41, label %omp.reduction.cleanup42
+! CHECK:         br i1 %{{.*}}, label %omp.reduction.cleanup43, label %omp.reduction.cleanup44
 
-! CHECK:       omp.reduction.cleanup42:                          ; preds = %omp.reduction.cleanup41, %omp.reduction.cleanup
-! CHECK-NEXT:    br label %omp.region.cont40
+! CHECK:       omp.reduction.cleanup44:                          ; preds = %omp.reduction.cleanup43, %omp.reduction.cleanup
+! CHECK-NEXT:    br label %omp.region.cont42
 
-! CHECK:       omp.region.cont40:                                ; preds = %omp.reduction.cleanup42
+! CHECK:       omp.region.cont42:                                ; preds = %omp.reduction.cleanup44
 ! CHECK-NEXT:    %{{.*}} = load ptr, ptr
-! CHECK-NEXT:    br label %omp.reduction.cleanup44
-
-! CHECK:       omp.reduction.cleanup44:                          ; preds = %omp.region.cont40
-!                [null check]
-! CHECK:         br i1 %{{.*}}, label %omp.reduction.cleanup45, label %omp.reduction.cleanup46
-
-! CHECK:       omp.reduction.cleanup46:                          ; preds = %omp.reduction.cleanup45, %omp.reduction.cleanup44
-! CHECK-NEXT:    br label %omp.region.cont43
-
-! CHECK:       omp.region.cont43:                                ; preds = %omp.reduction.cleanup46
-! CHECK-NEXT:    br label %omp.private.dealloc
-
-! CHECK:       omp.private.dealloc:                              ; preds = %omp.region.cont43
-!                [null check]
-! CHECK:         br i1 %{{.*}}, label %omp.private.dealloc48, label %omp.private.dealloc49
-
-! CHECK:       omp.private.dealloc49:                            ; preds = %omp.private.dealloc48, %omp.private.dealloc
-! CHECK-NEXT:    br label %omp.region.cont47
-
-! CHECK:       omp.region.cont47:                                ; preds = %omp.private.dealloc49
-! CHECK-NEXT:    br label %omp.private.dealloc51
-
-! CHECK:       omp.private.dealloc51:                            ; preds = %omp.region.cont47
-!                [null check]
-! CHECK:         br i1 %{{.*}}, label %omp.private.dealloc52, label %omp.private.dealloc53
-
-! CHECK:       omp.private.dealloc53:                            ; preds = %omp.private.dealloc52, %omp.private.dealloc51
-! CHECK-NEXT:    br label %omp.region.cont50
-
-! CHECK:       omp.region.cont50:                                ; preds = %omp.private.dealloc53
-! CHECK-NEXT:    br label %omp.par.outlined.exit.exitStub
-
-! CHECK:       omp.private.dealloc52:                            ; preds = %omp.private.dealloc51
-!                [dealloc memory]
-! CHECK:         br label %omp.private.dealloc53
-
-! CHECK:       omp.private.dealloc48:                            ; preds = %omp.private.dealloc
-!                [dealloc memory]
-! CHECK:         br label %omp.private.dealloc49
-
-! CHECK:       omp.reduction.cleanup45:                          ; preds = %omp.reduction.cleanup44
-! CHECK-NEXT:    call void @free(
 ! CHECK-NEXT:    br label %omp.reduction.cleanup46
 
-! CHECK:       omp.reduction.cleanup41:                          ; preds = %omp.reduction.cleanup
-! CHECK-NEXT:    call void @free(
-! CHECK-NEXT:    br label %omp.reduction.cleanup42
+! CHECK:       omp.reduction.cleanup46:                          ; preds = %omp.region.cont42
+!                [null check]
+! CHECK:         br i1 %{{.*}}, label %omp.reduction.cleanup47, label %omp.reduction.cleanup48
 
-! CHECK:       omp.par.region28:                                 ; preds = %omp.par.region27
+! CHECK:       omp.par.region30:                                 ; preds = %omp.par.region29
 ! CHECK-NEXT:    call void @_FortranAStopStatement
 
-! CHECK:       omp.reduction.neutral23:                          ; preds = %omp.reduction.neutral22
+! CHECK:       omp.reduction.neutral25:                          ; preds = %omp.reduction.neutral24
 !                [source length was zero: finish initializing array]
-! CHECK:         br label %omp.reduction.neutral25
+! CHECK:         br label %omp.reduction.neutral27
 
-! CHECK:       omp.reduction.neutral18:                          ; preds = %omp.reduction.neutral
+! CHECK:       omp.reduction.neutral20:                          ; preds = %omp.reduction.neutral
 !                [source length was zero: finish initializing array]
-! CHECK:         br label %omp.reduction.neutral20
+! CHECK:         br label %omp.reduction.neutral22
 
-! CHECK:       omp.private.copy15:                               ; preds = %omp.private.copy14
+! CHECK:       omp.private.copy17:                               ; preds = %omp.private.copy16
 !                [source length was non-zero: call assign runtime]
-! CHECK:         br label %omp.private.copy16
+! CHECK:         br label %omp.private.copy18
 
-! CHECK:       omp.private.copy11:                               ; preds = %omp.private.copy10
+! CHECK:       omp.private.copy13:                               ; preds = %omp.private.copy12
 !                [source length was non-zero: call assign runtime]
-! CHECK:         br label %omp.private.copy12
+! CHECK:         br label %omp.private.copy14
 
-! CHECK:       omp.private.alloc1:                               ; preds = %omp.private.alloc
+! CHECK:       omp.private.init2:                               ; preds = %omp.private.init1
 !                [var extent was non-zero: malloc a private array]
-! CHECK:         br label %omp.private.alloc3
+! CHECK:         br label %omp.private.init4
 
-! CHECK:       omp.private.alloc6:                               ; preds = %omp.private.alloc5
+! CHECK:       omp.private.init8:                               ; preds = %omp.private.init7
 !                [var extent was non-zero: malloc a private array]
-! CHECK:         br label %omp.private.alloc8
+! CHECK:         br label %omp.private.init10
 
-! CHECK:       omp.par.outlined.exit.exitStub:                   ; preds = %omp.region.cont50
+! CHECK:       omp.par.outlined.exit.exitStub:                   ; preds = %omp.region.cont52
 ! CHECK-NEXT:    ret void
diff --git a/flang/test/Integration/OpenMP/private-global.f90 b/flang/test/Integration/OpenMP/private-global.f90
index 07dbe86e5ec930..39d7e2274cff9a 100644
--- a/flang/test/Integration/OpenMP/private-global.f90
+++ b/flang/test/Integration/OpenMP/private-global.f90
@@ -21,20 +21,24 @@ program bug
 ! CHECK:         %[[VAL_10:.*]] = load i32, ptr %[[VAL_11:.*]], align 4
 ! CHECK:         store i32 %[[VAL_10]], ptr %[[VAL_9]], align 4
 ! CHECK:         %[[VAL_12:.*]] = load i32, ptr %[[VAL_9]], align 4
-! CHECK:         %[[PRIV_TABLE:.*]] = alloca [10 x i32], i64 1, align 4
+! CHECK:         %[[PRIV_BOX_ALLOC:.*]] = alloca { ptr, i64, i32, i8, i8, i8, i8, [1 x [3 x i64]] }, align 8
 ! ...
 ! check that we use the private copy of table for the assignment
 ! CHECK:       omp.par.region1:
 ! CHECK:         %[[ELEMENTAL_TMP:.*]] = alloca { ptr, i64, i32, i8, i8, i8, i8, [1 x [3 x i64]] }, align 8
 ! CHECK:         %[[TABLE_BOX_ADDR:.*]] = alloca { ptr, i64, i32, i8, i8, i8, i8, [1 x [3 x i64]] }, align 8
 ! CHECK:         %[[BOXED_FIFTY:.*]] = alloca { ptr, i64, i32, i8, i8, i8, i8 }, align 8
+! CHECK:         %[[FIFTY:.*]] = alloca i32, i64 1, align 4
+! CHECK:         %[[INTERMEDIATE:.*]] = alloca { ptr, i64, i32, i8, i8, i8, i8, [1 x [3 x i64]] }, align 8
 ! CHECK:         %[[TABLE_BOX_ADDR2:.*]] = alloca { ptr, i64, i32, i8, i8, i8, i8, [1 x [3 x i64]] }, i64 1, align 8
-! CHECK:         %[[TABLE_BOX_VAL:.*]] = insertvalue { ptr, i64, i32, i8, i8, i8, i8, [1 x [3 x i64]] } { ptr undef, i64 ptrtoint (ptr getelementptr (i32, ptr null, i32 1) to i64), i32 20240719, i8 1, i8 9, i8 0, i8 0, [1 x [3 x i64]] {{\[\[}}3 x i64] [i64 1, i64 10, i64 ptrtoint (ptr getelementptr (i32, ptr null, i32 1) to i64)]] }, ptr %[[PRIV_TABLE]], 0
-! CHECK:         store { ptr, i64, i32, i8, i8, i8, i8, [1 x [3 x i64]] } %[[TABLE_BOX_VAL]], ptr %[[TABLE_BOX_ADDR]], align 8
-! CHECK :         %[[TABLE_BOX_VAL2:.*]] = load { ptr, i64, i32, i8, i8, i8, i8, [1 x [3 x i64]] }, ptr %[[TABLE_BOX_ADDR]], align 8
-! CHECK :         store { ptr, i64, i32, i8, i8, i8, i8, [1 x [3 x i64]] } %[[TABLE_BOX_VAL2]], ptr %[[TABLE_BOX_ADDR2]], align 8
-! CHECK:         call void @llvm.memcpy.p0.p0.i32(ptr %[[TABLE_BOX_ADDR2]], ptr %[[TABLE_BOX_ADDR]], i32 48, i1 false)
+! CHECK:         call void @llvm.memcpy.p0.p0.i32(ptr %[[INTERMEDIATE]], ptr %[[PRIV_BOX_ALLOC]], i32 48, i1 false)
+! CHECK:         store i32 50, ptr %[[FIFTY]], align 4
+! CHECK:         %[[FIFTY_BOX_VAL:.*]] = insertvalue { ptr, i64, i32, i8, i8, i8, i8 } { ptr undef, i64 ptrtoint (ptr getelementptr (i32, ptr null, i32 1) to i64), i32 20240719, i8 0, i8 9, i8 0, i8 0 }, ptr %[[FIFTY]], 0
+! CHECK:         store { ptr, i64, i32, i8, i8, i8, i8 } %[[FIFTY_BOX_VAL]], ptr %[[BOXED_FIFTY]], align 8
+! CHECK:         call void @llvm.memcpy.p0.p0.i32(ptr %[[TABLE_BOX_ADDR2]], ptr %[[INTERMEDIATE]], i32 48, i1 false)
 ! CHECK:         call void @_FortranAAssign(ptr %[[TABLE_BOX_ADDR2]], ptr %[[BOXED_FIFTY]], ptr @{{.*}}, i32 9)
+! CHECK:         call void @llvm.memcpy.p0.p0.i32(ptr %[[TABLE_BOX_ADDR]], ptr %[[PRIV_BOX_ALLOC]], i32 48, i1 false)
+! CHECK:         %[[PRIV_TABLE:.*]] = call ptr @malloc(i64 ptrtoint (ptr getelementptr ([10 x i32], ptr null, i32 1) to i64))
 ! ...
 ! check that we use the private copy of table for table/=50
 ! CHECK:       omp.par.region3:
@@ -43,5 +47,3 @@ program bug
 ! CHECK:         %[[VAL_46:.*]] = mul nsw i64 %[[VAL_45]], 1
 ! CHECK:         %[[VAL_47:.*]] = add nsw i64 %[[VAL_46]], 0
 ! CHECK:         %[[VAL_48:.*]] = getelementptr i32, ptr %[[PRIV_TABLE]], i64 %[[VAL_47]]
-! CHECK:         %[[VAL_49:.*]] = load i32, ptr %[[VAL_48]], align 4
-! CHECK:         %[[VAL_50:.*]] = icmp ne i32 %[[VAL_49]], 50
diff --git a/flang/test/Lower/OpenMP/DelayedPrivatization/distribute-standalone-private.f90 b/flang/test/Lower/OpenMP/DelayedPrivatization/distribute-standalone-private.f90
index 9c2ff8b5284859..8098cd53e9d2ec 100644
--- a/flang/test/Lower/OpenMP/DelayedPrivatization/distribute-standalone-private.f90
+++ b/flang/test/Lower/OpenMP/DelayedPrivatization/distribute-standalone-private.f90
@@ -16,8 +16,8 @@ subroutine standalone_distribute
     !$omp end teams
 end subroutine standalone_distribute
 
-! CHECK: omp.private {type = private} @[[I_PRIVATIZER_SYM:.*]] : !fir.ref<i32>
-! CHECK: omp.private {type = private} @[[VAR_PRIVATIZER_SYM:.*]] : !fir.ref<i32>
+! CHECK: omp.private {type = private} @[[I_PRIVATIZER_SYM:.*]] : i32
+! CHECK: omp.private {type = private} @[[VAR_PRIVATIZER_SYM:.*]] : i32
 
 
 ! CHECK-LABEL: func.func @_QPstandalone_distribute() {
diff --git a/flang/test/Lower/OpenMP/DelayedPrivatization/target-private-allocatable.f90 b/flang/test/Lower/OpenMP/DelayedPrivatization/target-private-allocatable.f90
index e11525c569ffb8..3e2b3e59018b88 100644
--- a/flang/test/Lower/OpenMP/DelayedPrivatization/target-private-allocatable.f90
+++ b/flang/test/Lower/OpenMP/DelayedPrivatization/target-private-allocatable.f90
@@ -16,28 +16,25 @@ end subroutine target_allocatable
 
 ! CHECK-LABEL: omp.private {type = private}
 ! CHECK-SAME:    @[[VAR_PRIVATIZER_SYM:.*]] :
-! CHECK-SAME:      [[TYPE:!fir.ref<!fir.box<!fir.heap<i32>>>]] alloc {
-! CHECK:  ^bb0(%[[PRIV_ARG:.*]]: [[TYPE]]):
-! CHECK:    %[[PRIV_ALLOC:.*]] = fir.alloca [[DESC_TYPE:!fir.box<!fir.heap<i32>>]] {bindc_name = "alloc_var", {{.*}}}
+! CHECK-SAME:      [[DESC_TYPE:!fir.box<!fir.heap<i32>>]] init {
+! CHECK:  ^bb0(%[[PRIV_ARG:.*]]: [[TYPE:!fir.ref<!fir.box<!fir.heap<i32>>>]], %[[PRIV_ALLOC:.*]]: [[TYPE]]):
 
 ! CHECK-NEXT:   %[[PRIV_ARG_VAL:.*]] = fir.load %[[PRIV_ARG]] : [[TYPE]]
 ! CHECK-NEXT:   %[[PRIV_ARG_BOX:.*]] = fir.box_addr %[[PRIV_ARG_VAL]] : ([[DESC_TYPE]]) -> !fir.heap<i32>
 ! CHECK-NEXT:   %[[PRIV_ARG_ADDR:.*]] = fir.convert %[[PRIV_ARG_BOX]] : (!fir.heap<i32>) -> i64
 ! CHECK-NEXT:   %[[C0:.*]] = arith.constant 0 : i64
-! CHECK-NEXT:   %[[ALLOC_COND:.*]] = arith.cmpi ne, %[[PRIV_ARG_ADDR]], %[[C0]] : i64
+! CHECK-NEXT:   %[[ALLOC_COND:.*]] = arith.cmpi eq, %[[PRIV_ARG_ADDR]], %[[C0]] : i64
 
 ! CHECK-NEXT:   fir.if %[[ALLOC_COND]] {
-! CHECK:          %[[PRIV_ALLOCMEM:.*]] = fir.allocmem i32 {fir.must_be_heap = true, {{.*}}}
+! CHECK-NEXT:     %[[ZERO_BOX:.*]] = fir.embox %[[PRIV_ARG_BOX]] : (!fir.heap<i32>) -> [[DESC_TYPE]]
+! CHECK-NEXT:     fir.store %[[ZERO_BOX]] to %[[PRIV_ALLOC]] : [[TYPE]]
+! CHECK-NEXT:   } else {
+! CHECK-NEXT:     %[[PRIV_ALLOCMEM:.*]] = fir.allocmem i32
 ! CHECK-NEXT:     %[[PRIV_ALLOCMEM_BOX:.*]] = fir.embox %[[PRIV_ALLOCMEM]] : (!fir.heap<i32>) -> [[DESC_TYPE]]
 ! CHECK-NEXT:     fir.store %[[PRIV_ALLOCMEM_BOX]] to %[[PRIV_ALLOC]] : [[TYPE]]
-! CHECK-NEXT:   } else {
-! CHECK-NEXT:     %[[ZERO_BITS:.*]] = fir.zero_bits !fir.heap<i32>
-! CHECK-NEXT:     %[[ZERO_BOX:.*]] = fir.embox %[[ZERO_BITS]] : (!fir.heap<i32>) -> [[DESC_TYPE]]
-! CHECK-NEXT:     fir.store %[[ZERO_BOX]] to %[[PRIV_ALLOC]] : [[TYPE]]
 ! CHECK-NEXT:   }
 
-! CHECK-NEXT:   %[[PRIV_DECL:.*]]:2 = hlfir.declare %[[PRIV_ALLOC]]
-! CHECK-NEXT:   omp.yield(%[[PRIV_DECL]]#0 : [[TYPE]])
+! CHECK-NEXT:   omp.yield(%[[PRIV_ALLOC]] : [[TYPE]])
 
 ! CHECK-NEXT: } dealloc {
 ! CHECK-NEXT: ^bb0(%[[PRIV_ARG:.*]]: [[TYPE]]):
@@ -49,12 +46,7 @@ end subroutine target_allocatable
 ! CHECK-NEXT:   %[[PRIV_NULL_COND:.*]] = arith.cmpi ne, %[[PRIV_ADDR_I64]], %[[C0]] : i64
 
 ! CHECK-NEXT:   fir.if %[[PRIV_NULL_COND]] {
-! CHECK:          %[[PRIV_VAL_2:.*]] = fir.load %[[PRIV_ARG]]
-! CHECK-NEXT:     %[[PRIV_ADDR_2:.*]] = fir.box_addr %[[PRIV_VAL_2]]
-! CHECK-NEXT:     fir.freemem %[[PRIV_ADDR_2]]
-! CHECK-NEXT:     %[[ZEROS:.*]] = fir.zero_bits
-! CHECK-NEXT:     %[[ZEROS_BOX:.*]]  = fir.embox %[[ZEROS]]
-! CHECK-NEXT:     fir.store %[[ZEROS_BOX]] to %[[PRIV_ARG]]
+! CHECK-NEXT:     fir.freemem %[[PRIV_ADDR]]
 ! CHECK-NEXT:   }
 
 ! CHECK-NEXT:   omp.yield
diff --git a/flang/test/Lower/OpenMP/DelayedPrivatization/target-private-simple.f90 b/flang/test/Lower/OpenMP/DelayedPrivatization/target-private-simple.f90
index 3c6836e81abe18..5abf2cbb15c92f 100644
--- a/flang/test/Lower/OpenMP/DelayedPrivatization/target-private-simple.f90
+++ b/flang/test/Lower/OpenMP/DelayedPrivatization/target-private-simple.f90
@@ -15,12 +15,7 @@ subroutine target_simple
 end subroutine target_simple
 
 ! CHECK-LABEL: omp.private {type = private}
-! CHECK-SAME:              @[[VAR_PRIVATIZER_SYM:.*]] : !fir.ref<i32> alloc {
-! CHECK:  ^bb0(%[[PRIV_ARG:.*]]: !fir.ref<i32>):
-! CHECK:    %[[PRIV_ALLOC:.*]] = fir.alloca i32 {bindc_name = "simple_var", {{.*}}}
-! CHECK:    %[[PRIV_DECL:.*]]:2 = hlfir.declare %[[PRIV_ALLOC]]
-! CHECK:    omp.yield(%[[PRIV_DECL]]#0 : !fir.ref<i32>)
-! CHECK: }
+! CHECK-SAME:              @[[VAR_PRIVATIZER_SYM:.*]] : i32
 
 ! CHECK-LABEL: func.func @_QPtarget_simple() {
 ! CHECK:  %[[VAR_ALLOC:.*]] = fir.alloca i32 {bindc_name = "simple_var", {{.*}}}
diff --git a/flang/test/Lower/OpenMP/DelayedPrivatization/wsloop.f90 b/flang/test/Lower/OpenMP/DelayedPrivatization/wsloop.f90
index 66fd120085c782..65c218fe9f77b0 100644
--- a/flang/test/Lower/OpenMP/DelayedPrivatization/wsloop.f90
+++ b/flang/test/Lower/OpenMP/DelayedPrivatization/wsloop.f90
@@ -13,8 +13,8 @@ subroutine wsloop_private
     end do
 end subroutine wsloop_private
 
-! CHECK: omp.private {type = private} @[[I_PRIVATIZER:.*i_private_ref_i32]]
-! CHECK: omp.private {type = firstprivate} @[[X_PRIVATIZER:.*x_firstprivate_ref_i32]]
+! CHECK: omp.private {type = private} @[[I_PRIVATIZER:.*i_private_i32]]
+! CHECK: omp.private {type = firstprivate} @[[X_PRIVATIZER:.*x_firstprivate_i32]]
 
 ! CHECK: func.func @{{.*}}() {
 ! CHECK:   %[[I_DECL:.*]]:2 = hlfir.declare %{{.*}} {uniq_name = "{{.*}}i"}
diff --git a/flang/test/Lower/OpenMP/cfg-conversion-omp.private.f90 b/flang/test/Lower/OpenMP/cfg-conversion-omp.private.f90
index 44036492f55957..8b8adf2b140c7f 100644
--- a/flang/test/Lower/OpenMP/cfg-conversion-omp.private.f90
+++ b/flang/test/Lower/OpenMP/cfg-conversion-omp.private.f90
@@ -21,34 +21,27 @@ subroutine delayed_privatization_allocatable
 end subroutine
 
 ! CFGConv-LABEL: omp.private {type = private}
-! CFGConv-SAME: @[[PRIVATIZER_SYM:.*]] : [[TYPE:!fir.ref<!fir.box<!fir.heap<i32>>>]] alloc {
+! CFGConv-SAME: @[[PRIVATIZER_SYM:.*]] : [[BOX_TYPE:!fir.box<!fir.heap<i32>>]] init {
 
-! CFGConv-NEXT: ^bb0(%[[PRIV_ARG:.*]]: [[TYPE]]):
-
-! CFGConv-NEXT:   %[[PRIV_ALLOC:.*]] = fir.alloca !fir.box<!fir.heap<i32>> {bindc_name = "var1", pinned, uniq_name = "_QFdelayed_privatization_allocatableEvar1"}
+! CFGConv-NEXT: ^bb0(%[[PRIV_ARG:.*]]: [[TYPE:!fir.ref<!fir.box<!fir.heap<i32>>>]], %[[PRIV_ALLOC:.*]]: [[TYPE]]):
 
 ! CFGConv-NEXT:   %[[PRIV_ARG_VAL:.*]] = fir.load %[[PRIV_ARG]] : !fir.ref<!fir.box<!fir.heap<i32>>>
 ! CFGConv-NEXT:   %[[PRIV_ARG_BOX:.*]] = fir.box_addr %[[PRIV_ARG_VAL]] : (!fir.box<!fir.heap<i32>>) -> !fir.heap<i32>
 ! CFGConv-NEXT:   %[[PRIV_ARG_ADDR:.*]] = fir.convert %[[PRIV_ARG_BOX]] : (!fir.heap<i32>) -> i64
 ! CFGConv-NEXT:   %[[C0:.*]] = arith.constant 0 : i64
-! CFGConv-NEXT:   %[[ALLOC_COND:.*]] = arith.cmpi ne, %[[PRIV_ARG_ADDR]], %[[C0]] : i64
-! CFGConv-NEXT:   cf.cond_br %[[ALLOC_COND]], ^[[ALLOC_MEM_BB:.*]], ^[[ZERO_MEM_BB:.*]]
-! CFGConv-NEXT: ^[[ALLOC_MEM_BB]]:
-! CFGConv:        fir.allocmem
+! CFGConv-NEXT:   %[[ALLOC_COND:.*]] = arith.cmpi eq, %[[PRIV_ARG_ADDR]], %[[C0]] : i64
+! CFGConv-NEXT:   cf.cond_br %[[ALLOC_COND]], ^[[ZERO_MEM_BB:.*]], ^[[ALLOC_MEM_BB:.*]]
+! CFGConv-NEXT: ^[[ZERO_MEM_BB]]:
 ! CFGConv:        cf.br ^[[DECL_BB:.*]]
-! CFGConv:      ^[[ZERO_MEM_BB]]:
-! CFGConv-NEXT:   fir.zero_bits
+! CFGConv:      ^[[ALLOC_MEM_BB]]:
+! CFGConv:        fir.allocmem
 ! CFGConv:        cf.br ^[[DECL_BB:.*]]
 ! CFGConv-NEXT: ^[[DECL_BB]]:
-! CFGConv-NEXT:   hlfir.declare
 ! CFGConv-NEXT:   omp.yield
 
 
 ! LLVMDialect-LABEL: omp.private {type = private}
-! LLVMDialect-SAME: @[[PRIVATIZER_SYM:.*]] : [[TYPE:!llvm.ptr]] alloc {
-
-! LLVMDialect-NEXT: ^bb0(%[[PRIV_ARG:.*]]: [[TYPE]]):
-! LLVMDialect:        llvm.alloca
+! LLVMDialect-SAME: @[[PRIVATIZER_SYM:.*]] : !llvm.struct<(ptr, i64, i32, i8, i8, i8, i8)> init {
+! LLVMDialect-NEXT: ^bb0(%[[PRIV_ARG:.*]]: !llvm.ptr, %[[PRIV_ALLOC:.*]]: !llvm.ptr):
 ! LLVMDialect:        llvm.call @malloc
-
 ! LLVMDialect-NOT:    hlfir.declare
diff --git a/flang/test/Lower/OpenMP/delayed-privatization-allocatable-array.f90 b/flang/test/Lower/OpenMP/delayed-privatization-allocatable-array.f90
index 759d80cf45b2a4..da093b2e97ef5b 100644
--- a/flang/test/Lower/OpenMP/delayed-privatization-allocatable-array.f90
+++ b/flang/test/Lower/OpenMP/delayed-privatization-allocatable-array.f90
@@ -16,38 +16,33 @@ subroutine delayed_privatization_private(var1, l1)
 end subroutine
 
 ! CHECK-LABEL: omp.private {type = firstprivate}
-! CHECK-SAME: @[[PRIVATIZER_SYM:.*]] : [[TYPE:!fir.ref<!fir.box<!fir.heap<!fir.array<\?xi32>>>>]] alloc {
+! CHECK-SAME: @[[PRIVATIZER_SYM:.*]] : [[BOX_TYPE:!fir.box<!fir.heap<!fir.array<\?xi32>>>]] init {
 
-! CHECK-NEXT: ^bb0(%[[PRIV_ARG:.*]]: [[TYPE]]):
-! CHECK-NEXT:   %[[PRIV_ALLOC:.*]] = fir.alloca !fir.box<!fir.heap<!fir.array<{{\?}}xi32>>> {bindc_name = "var1", pinned, uniq_name = "_QFdelayed_privatization_privateEvar1"}
+! CHECK-NEXT: ^bb0(%[[PRIV_ARG:.*]]: [[TYPE:!fir.ref<!fir.box<!fir.heap<!fir.array<\?xi32>>>>]], %[[PRIV_ALLOC:.*]]: [[TYPE]]):
 
 ! CHECK-NEXT:   %[[PRIV_ARG_VAL:.*]] = fir.load %[[PRIV_ARG]]
 ! CHECK-NEXT:   %[[PRIV_ARG_BOX:.*]] = fir.box_addr %[[PRIV_ARG_VAL]]
 ! CHECK-NEXT:   %[[PRIV_ARG_ADDR:.*]] = fir.convert %[[PRIV_ARG_BOX]]
 ! CHECK-NEXT:   %[[C0:.*]] = arith.constant 0 : i64
-! CHECK-NEXT:   %[[ALLOC_COND:.*]] = arith.cmpi ne, %[[PRIV_ARG_ADDR]], %[[C0]] : i64
+! CHECK-NEXT:   %[[ALLOC_COND:.*]] = arith.cmpi eq, %[[PRIV_ARG_ADDR]], %[[C0]] : i64
 
 ! CHECK-NEXT:   fir.if %[[ALLOC_COND]] {
-! CHECK-NEXT:     %[[PRIV_ARG_VAL:.*]] = fir.load %[[PRIV_ARG]] : [[TYPE]]
+! CHECK-NEXT:     %[[EMBOX_2:.*]] = fir.embox %[[PRIV_ARG_BOX]]
+! CHECK-NEXT:     fir.store %[[EMBOX_2]] to %[[PRIV_ALLOC]]
+! CHECK-NEXT:   } else {
 ! CHECK-NEXT:     %[[C0:.*]] = arith.constant 0 : index
 ! CHECK-NEXT:     %[[DIMS:.*]]:3 = fir.box_dims %[[PRIV_ARG_VAL]], %[[C0]]
-! CHECK-NEXT:     fir.box_addr %[[PRIV_ARG_VAL]]
-! CHECK-NEXT:     %[[C0_2:.*]] = arith.constant 0 : index 
-! CHECK-NEXT:     %[[CMP:.*]] = arith.cmpi sgt, %[[DIMS]]#1, %[[C0_2]] : index
-! CHECK-NEXT:     %[[SELECT:.*]] = arith.select %[[CMP]], %[[DIMS]]#1, %[[C0_2]] : index
-! CHECK-NEXT:     %[[MEM:.*]] = fir.allocmem !fir.array<?xi32>, %[[SELECT]]
-! CHECK-NEXT:     %[[SHAPE_SHIFT:.*]] = fir.shape_shift %[[DIMS]]#0, %[[SELECT]] : (index, index) -> !fir.shapeshift<1>
-! CHECK-NEXT:     %[[EMBOX:.*]] = fir.embox %[[MEM]](%[[SHAPE_SHIFT]])
+! CHECK-NEXT:     %[[SHAPE:.*]] = fir.shape %[[DIMS]]#1
+! CHECK-NEXT:     %[[MEM:.*]] = fir.allocmem !fir.array<?xi32>, %[[DIMS]]#1
+! CHECK-NEXT:     %[[TRUE:.*]] = arith.constant true
+! CHECK-NEXT:     %[[DECL:.*]]:2 = hlfir.declare %[[MEM]](%[[SHAPE]])
+! CHECK-NEXT:     %[[C0_2:.*]] = arith.constant 0 : index
+! CHECK-NEXT:     %[[DIMS_2:.*]]:3 = fir.box_dims %[[PRIV_ARG_VAL]], %[[C0_2]]
+! CHECK-NEXT:     %[[SHAPE_SHIFT:.*]] = fir.shape_shift %[[DIMS_2]]#0, %[[DIMS_2]]#1
+! CHECK-NEXT:     %[[EMBOX:.*]] = fir.rebox %[[DECL]]#0(%[[SHAPE_SHIFT]])
 ! CHECK-NEXT:     fir.store %[[EMBOX]] to %[[PRIV_ALLOC]]
-! CHECK-NEXT:   } else {
-! CHECK-NEXT:     %[[ZEROS:.*]] = fir.zero_bits
-! CHECK-NEXT:     %[[C0_3:.*]] = arith.constant 0 : index
-! CHECK-NEXT:     %[[SHAPE:.*]] = fir.shape %[[C0_3]] : (index) -> !fir.shape<1>
-! CHECK-NEXT:     %[[EMBOX_2:.*]] = fir.embox %[[ZEROS]](%[[SHAPE]])
-! CHECK-NEXT:     fir.store %[[EMBOX_2]] to %[[PRIV_ALLOC]]
 ! CHECK-NEXT:   }
 
-! CHECK-NEXT:   hlfir.declare
 ! CHECK-NEXT:   omp.yield
 
 ! CHECK-NEXT: } copy {
diff --git a/flang/test/Lower/OpenMP/delayed-privatization-allocatable-firstprivate.f90 b/flang/test/Lower/OpenMP/delayed-privatization-allocatable-firstprivate.f90
index b3a668018df1d5..01ca1073ae849b 100644
--- a/flang/test/Lower/OpenMP/delayed-privatization-allocatable-firstprivate.f90
+++ b/flang/test/Lower/OpenMP/delayed-privatization-allocatable-firstprivate.f90
@@ -18,9 +18,9 @@ subroutine delayed_privatization_allocatable
 end subroutine
 
 ! CHECK-LABEL: omp.private {type = firstprivate}
-! CHECK-SAME: @[[PRIVATIZER_SYM:.*]] : [[TYPE:!fir.ref<!fir.box<!fir.heap<i32>>>]] alloc {
+! CHECK-SAME: @[[PRIVATIZER_SYM:.*]] : [[BOX_TYPE:!fir.box<!fir.heap<i32>>]] init {
 
-! CHECK-NEXT: ^bb0(%[[PRIV_ARG:.*]]: [[TYPE]]):
+! CHECK-NEXT: ^bb0(%[[PRIV_ARG:.*]]: [[TYPE:!fir.ref<!fir.box<!fir.heap<i32>>>]], %[[PRIV_ALLOC:.*]]: [[TYPE]]):
 
 ! CHECK: } copy {
 ! CHECK: ^bb0(%[[PRIV_ORIG_ARG:.*]]: [[TYPE]], %[[PRIV_PRIV_ARG:.*]]: [[TYPE]]):
diff --git a/flang/test/Lower/OpenMP/delayed-privatization-allocatable-private.f90 b/flang/test/Lower/OpenMP/delayed-privatization-allocatable-private.f90
index f1fae2540aa4df..4ce66f52110e0e 100644
--- a/flang/test/Lower/OpenMP/delayed-privatization-allocatable-private.f90
+++ b/flang/test/Lower/OpenMP/delayed-privatization-allocatable-private.f90
@@ -15,30 +15,26 @@ subroutine delayed_privatization_allocatable
 end subroutine
 
 ! CHECK-LABEL: omp.private {type = private}
-! CHECK-SAME: @[[PRIVATIZER_SYM:.*]] : [[TYPE:!fir.ref<!fir.box<!fir.heap<i32>>>]] alloc {
+! CHECK-SAME: @[[PRIVATIZER_SYM:.*]] : [[BOX_TYPE:!fir.box<!fir.heap<i32>>]] init {
 
-! CHECK-NEXT: ^bb0(%[[PRIV_ARG:.*]]: [[TYPE]]):
-
-! CHECK-NEXT:   %[[PRIV_ALLOC:.*]] = fir.alloca !fir.box<!fir.heap<i32>> {bindc_name = "var1", pinned, uniq_name = "_QFdelayed_privatization_allocatableEvar1"}
+! CHECK-NEXT: ^bb0(%[[PRIV_ARG:.*]]: [[TYPE:!fir.ref<!fir.box<!fir.heap<i32>>>]], %[[PRIV_ALLOC:.*]]: [[TYPE]]):
 
 ! CHECK-NEXT:   %[[PRIV_ARG_VAL:.*]] = fir.load %[[PRIV_ARG]] : !fir.ref<!fir.box<!fir.heap<i32>>>
 ! CHECK-NEXT:   %[[PRIV_ARG_BOX:.*]] = fir.box_addr %[[PRIV_ARG_VAL]] : (!fir.box<!fir.heap<i32>>) -> !fir.heap<i32>
 ! CHECK-NEXT:   %[[PRIV_ARG_ADDR:.*]] = fir.convert %[[PRIV_ARG_BOX]] : (!fir.heap<i32>) -> i64
 ! CHECK-NEXT:   %[[C0:.*]] = arith.constant 0 : i64
-! CHECK-NEXT:   %[[ALLOC_COND:.*]] = arith.cmpi ne, %[[PRIV_ARG_ADDR]], %[[C0]] : i64
+! CHECK-NEXT:   %[[ALLOC_COND:.*]] = arith.cmpi eq, %[[PRIV_ARG_ADDR]], %[[C0]] : i64
 
 ! CHECK-NEXT:   fir.if %[[ALLOC_COND]] {
-! CHECK:          %[[PRIV_ALLOCMEM:.*]] = fir.allocmem i32 {fir.must_be_heap = true, uniq_name = "_QFdelayed_privatization_allocatableEvar1.alloc"}
+! CHECK-NEXT:     %[[ZERO_BOX:.*]] = fir.embox %[[PRIV_ARG_BOX]] : (!fir.heap<i32>) -> !fir.box<!fir.heap<i32>>
+! CHECK-NEXT:     fir.store %[[ZERO_BOX]] to %[[PRIV_ALLOC]] : !fir.ref<!fir.box<!fir.heap<i32>>>
+! CHECK-NEXT:   } else {
+! CHECK-NEXT:     %[[PRIV_ALLOCMEM:.*]] = fir.allocmem i32
 ! CHECK-NEXT:     %[[PRIV_ALLOCMEM_BOX:.*]] = fir.embox %[[PRIV_ALLOCMEM]] : (!fir.heap<i32>) -> !fir.box<!fir.heap<i32>>
 ! CHECK-NEXT:     fir.store %[[PRIV_ALLOCMEM_BOX]] to %[[PRIV_ALLOC]] : !fir.ref<!fir.box<!fir.heap<i32>>>
-! CHECK-NEXT:   } else {
-! CHECK-NEXT:     %[[ZERO_BITS:.*]] = fir.zero_bits !fir.heap<i32>
-! CHECK-NEXT:     %[[ZERO_BOX:.*]] = fir.embox %[[ZERO_BITS]] : (!fir.heap<i32>) -> !fir.box<!fir.heap<i32>>
-! CHECK-NEXT:     fir.store %[[ZERO_BOX]] to %[[PRIV_ALLOC]] : !fir.ref<!fir.box<!fir.heap<i32>>>
 ! CHECK-NEXT:   }
 
-! CHECK-NEXT:   %[[PRIV_DECL:.*]]:2 = hlfir.declare %[[PRIV_ALLOC]]
-! CHECK-NEXT:   omp.yield(%[[PRIV_DECL]]#0 : [[TYPE]])
+! CHECK-NEXT:   omp.yield(%[[PRIV_ALLOC]] : [[TYPE]])
 
 ! CHECK-NEXT: } dealloc {
 ! CHECK-NEXT: ^bb0(%[[PRIV_ARG:.*]]: [[TYPE]]):
@@ -50,12 +46,7 @@ subroutine delayed_privatization_allocatable
 ! CHECK-NEXT:   %[[PRIV_NULL_COND:.*]] = arith.cmpi ne, %[[PRIV_ADDR_I64]], %[[C0]] : i64
 
 ! CHECK-NEXT:   fir.if %[[PRIV_NULL_COND]] {
-! CHECK:          %[[PRIV_VAL_2:.*]] = fir.load %[[PRIV_ARG]]
-! CHECK-NEXT:     %[[PRIV_ADDR_2:.*]] = fir.box_addr %[[PRIV_VAL_2]]
-! CHECK-NEXT:     fir.freemem %[[PRIV_ADDR_2]]
-! CHECK-NEXT:     %[[ZEROS:.*]] = fir.zero_bits
-! CHECK-NEXT:     %[[ZEROS_BOX:.*]]  = fir.embox %[[ZEROS]]
-! CHECK-NEXT:     fir.store %[[ZEROS_BOX]] to %[[PRIV_ARG]]
+! CHECK-NEXT:     fir.freemem %[[PRIV_ADDR]]
 ! CHECK-NEXT:   }
 
 ! CHECK-NEXT:   omp.yield
diff --git a/flang/test/Lower/OpenMP/delayed-privatization-array.f90 b/flang/test/Lower/OpenMP/delayed-privatization-array.f90
index 3d641a0d69689a..95fa3f9e030527 100644
--- a/flang/test/Lower/OpenMP/delayed-privatization-array.f90
+++ b/flang/test/Lower/OpenMP/delayed-privatization-array.f90
@@ -29,20 +29,28 @@ subroutine delayed_privatization_private_1d(var1, l1, u1)
 end subroutine
 
 ! ONE_DIM-LABEL: omp.private {type = firstprivate}
-! ONE_DIM-SAME: @[[PRIVATIZER_SYM:.*]] : [[TYPE:!fir.box<!fir.array<\?xi32>>]] alloc {
-
-! ONE_DIM-NEXT: ^bb0(%[[PRIV_ARG:.*]]: [[TYPE]]):
-
-! ONE_DIM:   %[[C0:.*]] = arith.constant 0 : index
-! ONE_DIM-NEXT:   %[[DIMS:.*]]:3 = fir.box_dims %[[PRIV_ARG]], %[[C0]] : ([[TYPE]], index) -> (index, index, index)
-! ONE_DIM:   %[[PRIV_ALLOCA:.*]] = fir.alloca !fir.array<{{\?}}xi32>
-! ONE_DIM-NEXT:   %[[SHAPE_SHIFT:.*]] = fir.shape_shift %[[DIMS]]#0, %[[DIMS]]#1 : (index, index) -> !fir.shapeshift<1>
-! ONE_DIM-NEXT:   %[[PRIV_DECL:.*]]:2 = hlfir.declare %[[PRIV_ALLOCA]](%[[SHAPE_SHIFT]]) {uniq_name = "_QFdelayed_privatization_private_1dEvar1"}
-! ONE_DIM-NEXT:  omp.yield(%[[PRIV_DECL]]#0 : [[TYPE]])
+! ONE_DIM-SAME: @[[PRIVATIZER_SYM:.*]] : [[BOX_TYPE:!fir.box<!fir.array<\?xi32>>]] init {
+
+! ONE_DIM-NEXT: ^bb0(%[[PRIV_ARG:.*]]: [[TYPE:!fir.ref<!fir.box<!fir.array<\?xi32>>>]], %[[PRIV_BOX_ALLOC:.*]]: [[TYPE]]):
+
+! ONE_DIM-NEXT:   %[[PRIV_ARG_VAL:.*]] = fir.load %[[PRIV_ARG]]
+! ONE_DIM-NEXT:   %[[C0:.*]] = arith.constant 0 : index
+! ONE_DIM-NEXT:   %[[DIMS:.*]]:3 = fir.box_dims %[[PRIV_ARG_VAL]], %[[C0]]
+! ONE_DIM-NEXT:   %[[SHAPE:.*]] = fir.shape %[[DIMS]]#1
+! ONE_DIM-NEXT:   %[[ARRAY_ALLOC:.*]] = fir.allocmem !fir.array<?xi32>, %[[DIMS]]#1
+! ONE_DIM-NEXT:   %[[TRUE:.*]] = arith.constant true
+! ONE_DIM-NEXT:   %[[DECL:.*]]:2 = hlfir.declare %[[ARRAY_ALLOC]](%[[SHAPE]])
+! ONE_DIM-NEXT:   %[[C0_0:.*]] = arith.constant 0
+! ONE_DIM-NEXT:   %[[DIMS2:.*]]:3 = fir.box_dims %[[PRIV_ARG_VAL]], %[[C0_0]]
+! ONE_DIM-NEXT:   %[[SHAPE_SHIFT:.*]] = fir.shape_shift %[[DIMS2]]#0, %[[DIMS2]]#1
+! ONE_DIM-NEXT:   %[[REBOX:.*]] = fir.rebox %[[DECL]]#0(%[[SHAPE_SHIFT]])
+! ONE_DIM-NEXT:   fir.store %[[REBOX]] to %[[PRIV_BOX_ALLOC]]
+! ONE_DIM-NEXT:   omp.yield(%[[PRIV_BOX_ALLOC]] : [[TYPE]])
 
 ! ONE_DIM-NEXT: } copy {
 ! ONE_DIM-NEXT: ^bb0(%[[PRIV_ORIG_ARG:.*]]: [[TYPE]], %[[PRIV_PRIV_ARG:.*]]: [[TYPE]]):
-! ONE_DIM-NEXT:  hlfir.assign %[[PRIV_ORIG_ARG]] to %[[PRIV_PRIV_ARG]]
+! ONE_DIM-NEXT:  %[[PRIV_ORIG_ARG_VAL:.*]] = fir.load %[[PRIV_ORIG_ARG:.*]] : [[TYPE]]
+! ONE_DIM-NEXT:  hlfir.assign %[[PRIV_ORIG_ARG_VAL]] to %[[PRIV_PRIV_ARG]]
 ! ONE_DIM-NEXT:   omp.yield(%[[PRIV_PRIV_ARG]] : [[TYPE]])
 ! ONE_DIM-NEXT: }
 
@@ -58,24 +66,31 @@ subroutine delayed_privatization_private_2d(var1, l1, u1, l2, u2)
 end subroutine
 
 ! TWO_DIM-LABEL: omp.private {type = firstprivate}
-! TWO_DIM-SAME: @[[PRIVATIZER_SYM:.*]] : [[TYPE:!fir.box<!fir.array<\?x\?xi32>>]] alloc {
-
-! TWO_DIM-NEXT: ^bb0(%[[PRIV_ARG:.*]]: [[TYPE]]):
-! TWO_DIM:        %[[C0:.*]] = arith.constant 0 : index
-! TWO_DIM-NEXT:   %[[DIMS0:.*]]:3 = fir.box_dims %[[PRIV_ARG]], %[[C0]] : ([[TYPE]], index) -> (index, index, index)
+! TWO_DIM-SAME: @[[PRIVATIZER_SYM:.*]] : [[BOX_TYPE:!fir.box<!fir.array<\?x\?xi32>>]] init {
 
+! TWO_DIM-NEXT: ^bb0(%[[PRIV_ARG:.*]]: [[TYPE:!fir.ref<!fir.box<!fir.array<\?x\?xi32>>>]], %[[PRIV_BOX_ALLOC:.*]]: [[TYPE]]):
+! TWO_DIM-NEXT:   %[[PRIV_ARG_VAL:.*]] = fir.load %[[PRIV_ARG]]
+! TWO_DIM-NEXT:   %[[C0:.*]] = arith.constant 0 : index
+! TWO_DIM-NEXT:   %[[DIMS_0:.*]]:3 = fir.box_dims %[[PRIV_ARG_VAL]], %[[C0]]
 ! TWO_DIM-NEXT:   %[[C1:.*]] = arith.constant 1 : index
-! TWO_DIM-NEXT:   %[[DIMS1:.*]]:3 = fir.box_dims %[[PRIV_ARG]], %[[C1]] : ([[TYPE]], index) -> (index, index, index)
-
-! TWO_DIM-NEXT:   %[[PRIV_ALLOCA:.*]] = fir.alloca !fir.array<{{\?}}x{{\?}}xi32>, %[[DIMS0]]#1, %[[DIMS1]]#1 {bindc_name = "var1", pinned, uniq_name = "_QFdelayed_privatization_private_2dEvar1"}
-! TWO_DIM-NEXT:   %[[SHAPE_SHIFT:.*]] = fir.shape_shift %[[DIMS0]]#0, %[[DIMS0]]#1, %[[DIMS1]]#0, %[[DIMS1]]#1 : (index, index, index, index) -> !fir.shapeshift<2>
-
-! TWO_DIM-NEXT:   %[[PRIV_DECL:.*]]:2 = hlfir.declare %[[PRIV_ALLOCA]](%[[SHAPE_SHIFT]]) {uniq_name = "_QFdelayed_privatization_private_2dEvar1"}
-! TWO_DIM-NEXT:  omp.yield(%[[PRIV_DECL]]#0 : [[TYPE]])
+! TWO_DIM-NEXT:   %[[DIMS_1:.*]]:3 = fir.box_dims %[[PRIV_ARG_VAL]], %[[C1]]
+! TWO_DIM-NEXT:   %[[SHAPE:.*]] = fir.shape %[[DIMS_0]]#1, %[[DIMS_1]]#1
+! TWO_DIM-NEXT:   %[[ARRAY_ALLOC:.*]] = fir.allocmem !fir.array<?x?xi32>, %[[DIMS_0]]#1, %[[DIMS_1]]#1
+! TWO_DIM-NEXT:   %[[TRUE:.*]] = arith.constant true
+! TWO_DIM-NEXT:   %[[DECL:.*]]:2 = hlfir.declare %[[ARRAY_ALLOC]](%[[SHAPE]])
+! TWO_DIM-NEXT:   %[[C0_0:.*]] = arith.constant 0
+! TWO_DIM-NEXT:   %[[DIMS2_0:.*]]:3 = fir.box_dims %[[PRIV_ARG_VAL]], %[[C0_0]]
+! TWO_DIM-NEXT:   %[[C1_0:.*]] = arith.constant 1
+! TWO_DIM-NEXT:   %[[DIMS2_1:.*]]:3 = fir.box_dims %[[PRIV_ARG_VAL]], %[[C1_0]]
+! TWO_DIM-NEXT:   %[[SHAPE_SHIFT:.*]] = fir.shape_shift %[[DIMS2_0]]#0, %[[DIMS2_0]]#1, %[[DIMS2_1]]#0, %[[DIMS2_1]]#1
+! TWO_DIM-NEXT:   %[[REBOX:.*]] = fir.rebox %[[DECL]]#0(%[[SHAPE_SHIFT]])
+! TWO_DIM-NEXT:   fir.store %[[REBOX]] to %[[PRIV_BOX_ALLOC]]
+! TWO_DIM-NEXT:   omp.yield(%[[PRIV_BOX_ALLOC]] : [[TYPE]])
 
 ! TWO_DIM-NEXT: } copy {
 ! TWO_DIM-NEXT: ^bb0(%[[PRIV_ORIG_ARG:.*]]: [[TYPE]], %[[PRIV_PRIV_ARG:.*]]: [[TYPE]]):
-! TWO_DIM-NEXT:  hlfir.assign %[[PRIV_ORIG_ARG]] to %[[PRIV_PRIV_ARG]]
+! TWO_DIM-NEXT:  %[[PRIV_ORIG_ARG_VAL:.*]] = fir.load %[[PRIV_ORIG_ARG:.*]] : [[TYPE]]
+! TWO_DIM-NEXT:  hlfir.assign %[[PRIV_ORIG_ARG_VAL]] to %[[PRIV_PRIV_ARG]]
 ! TWO_DIM-NEXT:   omp.yield(%[[PRIV_PRIV_ARG]] : [[TYPE]])
 ! TWO_DIM-NEXT: }
 
@@ -90,11 +105,18 @@ program main
 end program
 
 ! ONE_DIM_DEFAULT_LB-LABEL: omp.private {type = private}
-! ONE_DIM_DEFAULT_LB-SAME: @[[PRIVATIZER_SYM:.*]] : [[TYPE:!fir.ref<!fir.array<10xi32>>]] alloc {
-
-! ONE_DIM_DEFAULT_LB-NEXT: ^bb0(%[[PRIV_ARG:.*]]: [[TYPE]]):
-
-! ONE_DIM_DEFAULT_LB:   %[[C10:.*]] = arith.constant 10 : index
-! ONE_DIM_DEFAULT_LB:   %[[PRIV_ALLOCA:.*]] = fir.alloca !fir.array<10xi32>
-! ONE_DIM_DEFAULT_LB:   %[[SHAPE:.*]] = fir.shape %[[C10]] : (index) -> !fir.shape<1>
-! ONE_DIM_DEFAULT_LB:   hlfir.declare %[[PRIV_ALLOCA]](%[[SHAPE]])
+! ONE_DIM_DEFAULT_LB-SAME: @[[PRIVATIZER_SYM:.*]] : [[BOX_TYPE:!fir.box<!fir.array<10xi32>>]] init {
+
+! ONE_DIM_DEFAULT_LB-NEXT: ^bb0(%[[PRIV_ARG:.*]]: [[TYPE:!fir.ref<!fir.box<!fir.array<10xi32>>>]], %[[PRIV_BOX_ALLOC:.*]]: [[TYPE]]):
+! ONE_DIM_DEFAULT_LB-NEXT:   %[[PRIV_ARG_VAL:.*]] = fir.load %[[PRIV_ARG]]
+! ONE_DIM_DEFAULT_LB-NEXT:   %[[C10:.*]] = arith.constant 10 : index
+! ONE_DIM_DEFAULT_LB-NEXT:   %[[SHAPE:.*]] = fir.shape %[[C10]]
+! ONE_DIM_DEFAULT_LB-NEXT:   %[[ARRAY_ALLOC:.*]] = fir.allocmem !fir.array<10xi32>
+! ONE_DIM_DEFAULT_LB-NEXT:   %[[TRUE:.*]] = arith.constant true
+! ONE_DIM_DEFAULT_LB-NEXT:   %[[DECL:.*]]:2 = hlfir.declare %[[ARRAY_ALLOC]](%[[SHAPE]])
+! ONE_DIM_DEFAULT_LB-NEXT:   %[[C0_0:.*]] = arith.constant 0
+! ONE_DIM_DEFAULT_LB-NEXT:   %[[DIMS2:.*]]:3 = fir.box_dims %[[PRIV_ARG_VAL]], %[[C0_0]]
+! ONE_DIM_DEFAULT_LB-NEXT:   %[[SHAPE_SHIFT:.*]] = fir.shape_shift %[[DIMS2]]#0, %[[DIMS2]]#1
+! ONE_DIM_DEFAULT_LB-NEXT:   %[[EMBOX:.*]] = fir.embox %[[DECL]]#0(%[[SHAPE_SHIFT]])
+! ONE_DIM_DEFAULT_LB-NEXT:   fir.store %[[EMBOX]] to %[[PRIV_BOX_ALLOC]]
+! ONE_DIM_DEFAULT_LB-NEXT:   omp.yield(%[[PRIV_BOX_ALLOC]] : [[TYPE]])
diff --git a/flang/test/Lower/OpenMP/delayed-privatization-character-array.f90 b/flang/test/Lower/OpenMP/delayed-privatization-character-array.f90
index 9a9d0c01212c8d..4c7287283c7ad8 100644
--- a/flang/test/Lower/OpenMP/delayed-privatization-character-array.f90
+++ b/flang/test/Lower/OpenMP/delayed-privatization-character-array.f90
@@ -23,19 +23,14 @@ subroutine delayed_privatization_character_array_static_len(var1)
 end subroutine
 
 ! STATIC_LEN-LABEL: omp.private {type = firstprivate}
-! STATIC_LEN-SAME: @[[PRIVATIZER_SYM:.*]] : [[TYPE:!fir.ref<!fir.array<5x!fir.char<1,10>>>]] alloc {
+! STATIC_LEN-SAME: @[[PRIVATIZER_SYM:.*]] : [[TYPE:!fir.box<!fir.array<5x!fir.char<1,10>>>]] init {
 
-! STATIC_LEN-NEXT: ^bb0(%[[PRIV_ARG:.*]]: [[TYPE]]):
-! STATIC_LEN-DAG:    %[[C5:.*]] = arith.constant 5 : index
-! STATIC_LEN-DAG:    %[[C10:.*]] = arith.constant 10 : index
-! STATIC_LEN-NEXT:   %[[PRIV_ALLOC:.*]] = fir.alloca !fir.array<5x!fir.char<1,10>>
-! STATIC_LEN-NEXT:   %[[ARRAY_SHAPE:.*]] = fir.shape %[[C5]]
-! STATIC_LEN-NEXT:   %[[PRIV_DECL:.*]]:2 = hlfir.declare %[[PRIV_ALLOC]](%[[ARRAY_SHAPE]]) typeparams %[[C10]]
-! STATIC_LEN-NEXT:   omp.yield(%[[PRIV_DECL]]#0
-
-! STATIC_LEN-NEXT: } copy {
-! STATIC_LEN-NEXT: ^bb0(%[[PRIV_ORIG_ARG:.*]]: [[TYPE]], %[[PRIV_PRIV_ARG:.*]]: [[TYPE]]):
-! STATIC_LEN-NEXT:   hlfir.assign %[[PRIV_ORIG_ARG]] to %[[PRIV_PRIV_ARG]]
+! STATIC_LEN-NEXT: ^bb0(%[[MOLD_REF:.*]]: !fir.ref<[[TYPE]]>, %[[ALLOC:.*]]: !fir.ref<[[TYPE]]>):
+!                    [init region]
+! STATIC_LEN:      } copy {
+! STATIC_LEN-NEXT: ^bb0(%[[PRIV_ORIG_ARG:.*]]: !fir.ref<[[TYPE]]>, %[[PRIV_PRIV_ARG:.*]]: !fir.ref<[[TYPE]]>):
+! STATIC_LEN-NEXT:   %[[ORIG:.*]] = fir.load %[[PRIV_ORIG_ARG]] : !fir.ref<[[TYPE]]>
+! STATIC_LEN-NEXT:   hlfir.assign %[[ORIG]] to %[[PRIV_PRIV_ARG]]
 
 ! STATIC_LEN-NEXT:   omp.yield(%[[PRIV_PRIV_ARG]]
 ! STATIC_LEN-NEXT: }
@@ -53,15 +48,5 @@ subroutine delayed_privatization_character_array_dynamic_len(var1, char_len, arr
 end subroutine
 
 ! DYN_LEN-LABEL: omp.private {type = private}
-! DYN_LEN-SAME: @[[PRIVATIZER_SYM:.*]] : [[TYPE:!fir.box<!fir.array<\?x!fir.char<1,\?>>>]] alloc {
-
-! DYN_LEN-NEXT: ^bb0(%[[PRIV_ARG:.*]]: [[TYPE]]):
-
-! DYN_LEN:        %[[C0:.*]] = arith.constant 0 : index
-! DYN_LEN-NEXT:   %[[BOX_DIM:.*]]:3 = fir.box_dims %[[PRIV_ARG]], %[[C0]]
-! DYN_LEN:        %[[CHAR_LEN:.*]] = fir.box_elesize %[[PRIV_ARG]]
-! DYN_LEN-NEXT:   %[[PRIV_ALLOC:.*]] = fir.alloca !fir.array<?x!fir.char<1,?>>(%[[CHAR_LEN]] : index)
-! DYN_LEN-NEXT:   %[[ARRAY_SHAPE:.*]] = fir.shape
-! DYN_LEN-NEXT:   %[[PRIV_DECL:.*]]:2 = hlfir.declare %[[PRIV_ALLOC]](%[[ARRAY_SHAPE]]) typeparams %[[CHAR_LEN]]
-
-! DYN_LEN-NEXT:   omp.yield(%[[PRIV_DECL]]#0
+! DYN_LEN-SAME: @[[PRIVATIZER_SYM:.*]] : [[TYPE:!fir.box<!fir.array<\?x!fir.char<1,\?>>>]] init {
+! DYN_LEN-NEXT: ^bb0(%[[MOLD_ARG:.*]]: !fir.ref<!fir.box<!fir.array<?x!fir.char<1,?>>>>, %[[ALLOC_ARG:.*]]: !fir.ref<!fir.box<!fir.array<?x!fir.char<1,?>>>>)
diff --git a/flang/test/Lower/OpenMP/delayed-privatization-firstprivate.f90 b/flang/test/Lower/OpenMP/delayed-privatization-firstprivate.f90
index 119f77ea266269..904ea783ad5b4b 100644
--- a/flang/test/Lower/OpenMP/delayed-privatization-firstprivate.f90
+++ b/flang/test/Lower/OpenMP/delayed-privatization-firstprivate.f90
@@ -15,12 +15,7 @@ subroutine delayed_privatization_firstprivate
 end subroutine
 
 ! CHECK-LABEL: omp.private {type = firstprivate}
-! CHECK-SAME: @[[VAR1_PRIVATIZER_SYM:.*]] : !fir.ref<i32> alloc {
-! CHECK-NEXT: ^bb0(%[[PRIV_ARG:.*]]: !fir.ref<i32>):
-! CHECK-NEXT:   %[[PRIV_ALLOC:.*]] = fir.alloca i32 {bindc_name = "var1", pinned, uniq_name = "_QFdelayed_privatization_firstprivateEvar1"}
-! CHECK-NEXT:   %[[PRIV_DECL:.*]]:2 = hlfir.declare %[[PRIV_ALLOC]] {uniq_name = "_QFdelayed_privatization_firstprivateEvar1"} : (!fir.ref<i32>) -> (!fir.ref<i32>, !fir.ref<i32>)
-! CHECK-NEXT:   omp.yield(%[[PRIV_DECL]]#0 : !fir.ref<i32>)
-! CHECK: } copy {
+! CHECK-SAME: @[[VAR1_PRIVATIZER_SYM:.*]] : i32 copy {
 ! CHECK: ^bb0(%[[PRIV_ORIG_ARG:.*]]: !fir.ref<i32>, %[[PRIV_PRIV_ARG:.*]]: !fir.ref<i32>):
 ! CHECK:    %[[ORIG_VAL:.*]] = fir.load %[[PRIV_ORIG_ARG]] : !fir.ref<i32>
 ! CHECK:    hlfir.assign %[[ORIG_VAL]] to %[[PRIV_PRIV_ARG]] : i32, !fir.ref<i32>
diff --git a/flang/test/Lower/OpenMP/delayed-privatization-private-firstprivate.f90 b/flang/test/Lower/OpenMP/delayed-privatization-private-firstprivate.f90
index 7d202f46c09d30..d961210dcbc381 100644
--- a/flang/test/Lower/OpenMP/delayed-privatization-private-firstprivate.f90
+++ b/flang/test/Lower/OpenMP/delayed-privatization-private-firstprivate.f90
@@ -17,13 +17,11 @@ subroutine delayed_privatization_private_firstprivate
 end subroutine
 
 ! CHECK-LABEL: omp.private {type = firstprivate}
-! CHECK-SAME: @[[VAR2_PRIVATIZER_SYM:.*]] : !fir.ref<i32> alloc {
-! CHECK: } copy {
+! CHECK-SAME: @[[VAR2_PRIVATIZER_SYM:.*]] : i32 copy {
 ! CHECK: }
 
 ! CHECK-LABEL: omp.private {type = private}
-! CHECK-SAME: @[[VAR1_PRIVATIZER_SYM:.*]] : !fir.ref<i32> alloc {
-! CHECK: }
+! CHECK-SAME: @[[VAR1_PRIVATIZER_SYM:.*]] : i32
 
 ! CHECK-LABEL: func.func @_QPdelayed_privatization_private_firstprivate() {
 ! CHECK:  %[[VAR1_ALLOC:.*]] = fir.alloca i32 {bindc_name = "var1"
diff --git a/flang/test/Lower/OpenMP/delayed-privatization-private.f90 b/flang/test/Lower/OpenMP/delayed-privatization-private.f90
index 7208521bcd77e4..69c362e4828bf5 100644
--- a/flang/test/Lower/OpenMP/delayed-privatization-private.f90
+++ b/flang/test/Lower/OpenMP/delayed-privatization-private.f90
@@ -15,12 +15,8 @@ subroutine delayed_privatization_private
 end subroutine
 
 ! CHECK-LABEL: omp.private {type = private}
-! CHECK-SAME: @[[PRIVATIZER_SYM:.*]] : !fir.ref<i32> alloc {
-! CHECK-NEXT: ^bb0(%[[PRIV_ARG:.*]]: !fir.ref<i32>):
-! CHECK-NEXT:   %[[PRIV_ALLOC:.*]] = fir.alloca i32 {bindc_name = "var1", pinned, uniq_name = "_QFdelayed_privatization_privateEvar1"}
-! CHECK-NEXT:   %[[PRIV_DECL:.*]]:2 = hlfir.declare %[[PRIV_ALLOC]] {uniq_name = "_QFdelayed_privatization_privateEvar1"} : (!fir.ref<i32>) -> (!fir.ref<i32>, !fir.ref<i32>)
-! CHECK-NEXT:   omp.yield(%[[PRIV_DECL]]#0 : !fir.ref<i32>)
-! CHECK-NOT: } copy {
+! CHECK-SAME: @[[PRIVATIZER_SYM:.*]] : i32
+! CHECK-NOT: copy {
 
 ! CHECK-LABEL: @_QPdelayed_privatization_private
 ! CHECK: %[[ORIG_ALLOC:.*]] = fir.alloca i32 {bindc_name = "var1", uniq_name = "_QFdelayed_privatization_privateEvar1"}
diff --git a/flang/test/Lower/OpenMP/delayed-privatization-reduction-byref.f90 b/flang/test/Lower/OpenMP/delayed-privatization-reduction-byref.f90
index 6c00bb23f15b96..f463f2b4630aef 100644
--- a/flang/test/Lower/OpenMP/delayed-privatization-reduction-byref.f90
+++ b/flang/test/Lower/OpenMP/delayed-privatization-reduction-byref.f90
@@ -19,7 +19,7 @@ subroutine red_and_delayed_private
 end subroutine
 
 ! CHECK-LABEL: omp.private {type = private}
-! CHECK-SAME: @[[PRIVATIZER_SYM:.*]] : !fir.ref<i32> alloc {
+! CHECK-SAME: @[[PRIVATIZER_SYM:.*]] : i32
 
 ! CHECK-LABEL: omp.declare_reduction
 ! CHECK-SAME: @[[REDUCTION_SYM:.*]] : !fir.ref<i32> alloc
diff --git a/flang/test/Lower/OpenMP/delayed-privatization-reduction.f90 b/flang/test/Lower/OpenMP/delayed-privatization-reduction.f90
index 38139e52ce95cb..a1ddbc30d6e466 100644
--- a/flang/test/Lower/OpenMP/delayed-privatization-reduction.f90
+++ b/flang/test/Lower/OpenMP/delayed-privatization-reduction.f90
@@ -22,7 +22,7 @@ subroutine red_and_delayed_private
 end subroutine
 
 ! CHECK-LABEL: omp.private {type = private}
-! CHECK-SAME: @[[PRIVATIZER_SYM:.*]] : !fir.ref<i32> alloc {
+! CHECK-SAME: @[[PRIVATIZER_SYM:.*]] : i32
 
 ! CHECK-LABEL: omp.declare_reduction
 ! CHECK-SAME: @[[REDUCTION_SYM:.*]] : i32 init
diff --git a/flang/test/Lower/OpenMP/implicit-dsa.f90 b/flang/test/Lower/OpenMP/implicit-dsa.f90
index a1912a46f9ae7e..f0f149bb415b09 100644
--- a/flang/test/Lower/OpenMP/implicit-dsa.f90
+++ b/flang/test/Lower/OpenMP/implicit-dsa.f90
@@ -6,99 +6,82 @@
 ! Privatizers
 
 ! CHECK-LABEL: omp.private
-! CHECK-SAME:      {type = private} @[[TEST6_Y_PRIV:.*]] : !fir.ref<i32>
-! CHECK:         fir.alloca i32 {bindc_name = "y"
-! CHECK-NOT:   } copy {
+! CHECK-SAME:      {type = private} @[[TEST6_Y_PRIV:.*]] : i32
+! CHECK-NOT:   copy {
 
 ! CHECK-LABEL: omp.private
-! CHECK-SAME:      {type = private} @[[TEST6_X_PRIV:.*]] : !fir.ref<i32>
-! CHECK:         fir.alloca i32 {bindc_name = "x"
-! CHECK-NOT:   } copy {
+! CHECK-SAME:      {type = private} @[[TEST6_X_PRIV:.*]] : i32
+! CHECK-NOT:   copy {
 
 ! CHECK-LABEL: omp.private
-! CHECK-SAME:      {type = firstprivate} @[[TEST6_Z_FIRSTPRIV:.*]] : !fir.ref<i32>
-! CHECK:         fir.alloca i32 {bindc_name = "z"
-! CHECK:       } copy {
+! CHECK-SAME:      {type = firstprivate} @[[TEST6_Z_FIRSTPRIV:.*]] : i32
+! CHECK-SAME:  copy {
 ! CHECK:         hlfir.assign
 
 ! CHECK-LABEL: omp.private
-! CHECK-SAME:      {type = firstprivate} @[[TEST6_Y_FIRSTPRIV:.*]] : !fir.ref<i32>
-! CHECK:         fir.alloca i32 {bindc_name = "y"
-! CHECK:       } copy {
+! CHECK-SAME:      {type = firstprivate} @[[TEST6_Y_FIRSTPRIV:.*]] : i32
+! CHECK-SAME:  copy {
 ! CHECK:         hlfir.assign
 
 ! CHECK-LABEL: omp.private
-! CHECK-SAME:      {type = firstprivate} @[[TEST6_X_FIRSTPRIV:.*]] : !fir.ref<i32>
-! CHECK:         fir.alloca i32 {bindc_name = "x"
-! CHECK:       } copy {
+! CHECK-SAME:      {type = firstprivate} @[[TEST6_X_FIRSTPRIV:.*]] : i32
+! CHECK-SAME:  copy {
 ! CHECK:         hlfir.assign
 
 ! CHECK-LABEL: omp.private
-! CHECK-SAME:      {type = firstprivate} @[[TEST5_X_FIRSTPRIV:.*]] : !fir.ref<i32>
-! CHECK:         fir.alloca i32 {bindc_name = "x"
-! CHECK:       } copy {
+! CHECK-SAME:      {type = firstprivate} @[[TEST5_X_FIRSTPRIV:.*]] : i32
+! CHECK-SAME:  copy {
 ! CHECK:         hlfir.assign
 
 ! CHECK-LABEL: omp.private
-! CHECK-SAME:      {type = private} @[[TEST5_X_PRIV:.*]] : !fir.ref<i32>
-! CHECK:         fir.alloca i32 {bindc_name = "x"
-! CHECK-NOT:   } copy {
+! CHECK-SAME:      {type = private} @[[TEST5_X_PRIV:.*]] : i32
+! CHECK-NOT:   copy {
 
 ! CHECK-LABEL: omp.private
-! CHECK-SAME:      {type = firstprivate} @[[TEST4_Y_FIRSTPRIV:.*]] : !fir.ref<i32>
-! CHECK:         fir.alloca i32 {bindc_name = "y"
-! CHECK:       } copy {
+! CHECK-SAME:      {type = firstprivate} @[[TEST4_Y_FIRSTPRIV:.*]] : i32
+! CHECK-SAME:  copy {
 ! CHECK:         hlfir.assign
 
 ! CHECK-LABEL: omp.private
-! CHECK-SAME:      {type = firstprivate} @[[TEST4_Z_FIRSTPRIV:.*]] : !fir.ref<i32>
-! CHECK:         fir.alloca i32 {bindc_name = "z"
-! CHECK:       } copy {
+! CHECK-SAME:      {type = firstprivate} @[[TEST4_Z_FIRSTPRIV:.*]] : i32
+! CHECK-SAME:  copy {
 ! CHECK:         hlfir.assign
 
 ! CHECK-LABEL: omp.private
-! CHECK-SAME:      {type = firstprivate} @[[TEST4_X_FIRSTPRIV:.*]] : !fir.ref<i32>
-! CHECK:         fir.alloca i32 {bindc_name = "x"
-! CHECK:       } copy {
+! CHECK-SAME:      {type = firstprivate} @[[TEST4_X_FIRSTPRIV:.*]] : i32
+! CHECK-SAME:  copy {
 ! CHECK:         hlfir.assign
 
 ! CHECK-LABEL: omp.private
-! CHECK-SAME:      {type = private} @[[TEST4_Y_PRIV:.*]] : !fir.ref<i32>
-! CHECK:         fir.alloca i32 {bindc_name = "y"
-! CHECK-NOT:   } copy {
+! CHECK-SAME:      {type = private} @[[TEST4_Y_PRIV:.*]] : i32
+! CHECK-NOT:   copy {
 
 ! CHECK-LABEL: omp.private
-! CHECK-SAME:      {type = private} @[[TEST4_Z_PRIV:.*]] : !fir.ref<i32>
-! CHECK:         fir.alloca i32 {bindc_name = "z"
-! CHECK-NOT:   } copy {
+! CHECK-SAME:      {type = private} @[[TEST4_Z_PRIV:.*]] : i32
+! CHECK-NOT:   copy {
 
 ! CHECK-LABEL: omp.private
-! CHECK-SAME:      {type = private} @[[TEST4_X_PRIV:.*]] : !fir.ref<i32>
-! CHECK:         fir.alloca i32 {bindc_name = "x"
-! CHECK-NOT:   } copy {
+! CHECK-SAME:      {type = private} @[[TEST4_X_PRIV:.*]] : i32
+! CHECK-NOT:   copy {
 
 ! CHECK-LABEL: omp.private
-! CHECK-SAME:      {type = firstprivate} @[[TEST3_X_FIRSTPRIV:.*]] : !fir.ref<i32>
-! CHECK:         fir.alloca i32 {bindc_name = "x"
-! CHECK:       } copy {
+! CHECK-SAME:      {type = firstprivate} @[[TEST3_X_FIRSTPRIV:.*]] : i32
+! CHECK:       copy {
 ! CHECK:         hlfir.assign
 
 ! CHECK-LABEL: omp.private
-! CHECK-SAME:      {type = firstprivate} @[[TEST2_X_FIRSTPRIV:.*]] : !fir.ref<i32>
-! CHECK:         fir.alloca i32 {bindc_name = "x"
-! CHECK:       } copy {
+! CHECK-SAME:      {type = firstprivate} @[[TEST2_X_FIRSTPRIV:.*]] : i32
+! CHECK:       copy {
 ! CHECK:         hlfir.assign
 
 ! CHECK-LABEL: omp.private
-! CHECK-SAME:      {type = firstprivate} @[[TEST1_X_FIRSTPRIV:.*]] : !fir.ref<i32>
-! CHECK:         fir.alloca i32 {bindc_name = "x"
-! CHECK:       } copy {
+! CHECK-SAME:      {type = firstprivate} @[[TEST1_X_FIRSTPRIV:.*]] : i32
+! CHECK:       copy {
 ! CHECK:         hlfir.assign
 
 ! CHECK-LABEL: omp.private
-! CHECK-SAME:      {type = private} @[[TEST1_Y_PRIV:.*]] : !fir.ref<i32>
-! CHECK:         fir.alloca i32 {bindc_name = "y"
-! CHECK-NOT:   } copy {
+! CHECK-SAME:      {type = private} @[[TEST1_Y_PRIV:.*]] : i32
+! CHECK-NOT:   copy {
 
 ! Basic cases.
 !CHECK-LABEL: func @_QPimplicit_dsa_test1
diff --git a/flang/test/Lower/OpenMP/loop-directive.f90 b/flang/test/Lower/OpenMP/loop-directive.f90
index 9fa0de3bfe171a..4e2971cbb4aad4 100644
--- a/flang/test/Lower/OpenMP/loop-directive.f90
+++ b/flang/test/Lower/OpenMP/loop-directive.f90
@@ -4,8 +4,8 @@
 ! RUN: %flang_fc1 -emit-hlfir -fopenmp -fopenmp-version=50 -o - %s 2>&1 | FileCheck %s
 
 ! CHECK: omp.declare_reduction @[[RED:add_reduction_i32]] : i32
-! CHECK: omp.private {type = private} @[[DUMMY_PRIV:.*test_privateEdummy_private.*]] : !fir.ref<i32>
-! CHECK: omp.private {type = private} @[[I_PRIV:.*test_no_clausesEi.*]] : !fir.ref<i32>
+! CHECK: omp.private {type = private} @[[DUMMY_PRIV:.*test_privateEdummy_private.*]] : i32
+! CHECK: omp.private {type = private} @[[I_PRIV:.*test_no_clausesEi.*]] : i32
 
 ! CHECK-LABEL: func.func @_QPtest_no_clauses
 subroutine test_no_clauses()
diff --git a/flang/test/Lower/OpenMP/parallel-firstprivate-clause-scalar.f90 b/flang/test/Lower/OpenMP/parallel-firstprivate-clause-scalar.f90
index f0bee355543a67..a56fce9c89fdc1 100644
--- a/flang/test/Lower/OpenMP/parallel-firstprivate-clause-scalar.f90
+++ b/flang/test/Lower/OpenMP/parallel-firstprivate-clause-scalar.f90
@@ -4,28 +4,18 @@
 ! RUN: bbc -fopenmp -emit-hlfir %s -o - \
 ! RUN: | FileCheck %s --check-prefix=CHECK
 
-!CHECK:  omp.private {type = firstprivate} @[[ARG2_LOGICAL_PRIVATIZER:_QFfirstprivate_logicalEarg2_firstprivate_ref_l8]] : !fir.ref<!fir.logical<1>> alloc
-
-!CHECK:  omp.private {type = firstprivate} @[[ARG1_LOGICAL_PRIVATIZER:_QFfirstprivate_logicalEarg1_firstprivate_ref_l32]] : !fir.ref<!fir.logical<4>> alloc {
-!CHECK:  ^bb0(%{{.*}}: !fir.ref<!fir.logical<4>>):
-!CHECK:    %[[PVT_ALLOC:.*]] = fir.alloca !fir.logical<4> {{.*}}
-!CHECK:    %[[PVT_DECL:.*]]:2 = hlfir.declare %[[PVT_ALLOC]] {{.*}}
-!CHECK:    omp.yield(%[[PVT_DECL]]#0 : !fir.ref<!fir.logical<4>>)
-!CHECK:  } copy {
+!CHECK:  omp.private {type = firstprivate} @[[ARG2_LOGICAL_PRIVATIZER:_QFfirstprivate_logicalEarg2_firstprivate_l8]] : !fir.logical<1>
+
+!CHECK:  omp.private {type = firstprivate} @[[ARG1_LOGICAL_PRIVATIZER:_QFfirstprivate_logicalEarg1_firstprivate_l32]] : !fir.logical<4> copy {
 !CHECK:  ^bb0(%[[ORIG_REF:.*]]: !fir.ref<!fir.logical<4>>, %[[PVT_REF:.*]]: !fir.ref<!fir.logical<4>>):
 !CHECK:    %[[ORIG_VAL:.*]] = fir.load %[[ORIG_REF]] : {{.*}}
 !CHECK:    hlfir.assign %[[ORIG_VAL]] to %[[PVT_REF]] {{.*}}
 !CHECK:    omp.yield(%[[PVT_REF]] : !fir.ref<!fir.logical<4>>)
 !CHECK:  }
 
-!CHECK:  omp.private {type = firstprivate} @[[ARG2_COMPLEX_PRIVATIZER:_QFfirstprivate_complexEarg2_firstprivate_ref_z64]] : !fir.ref<complex<f64>> alloc
+!CHECK:  omp.private {type = firstprivate} @[[ARG2_COMPLEX_PRIVATIZER:_QFfirstprivate_complexEarg2_firstprivate_z64]] : complex<f64>
 
-!CHECK:  omp.private {type = firstprivate} @[[ARG1_COMPLEX_PRIVATIZER:_QFfirstprivate_complexEarg1_firstprivate_ref_z32]] : !fir.ref<complex<f32>> alloc {
-!CHECK:  ^bb0(%{{.*}}: !fir.ref<complex<f32>>):
-!CHECK:    %[[PVT_ALLOC:.*]] = fir.alloca complex<f32> {bindc_name = "arg1", {{.*}}}
-!CHECK:    %[[PVT_DECL:.*]]:2 = hlfir.declare %[[PVT_ALLOC]] {{.*}}
-!CHECK:    omp.yield(%[[PVT_DECL]]#0 : !fir.ref<complex<f32>>)
-!CHECK:  } copy {
+!CHECK:  omp.private {type = firstprivate} @[[ARG1_COMPLEX_PRIVATIZER:_QFfirstprivate_complexEarg1_firstprivate_z32]] : complex<f32> copy {
 !CHECK:  ^bb0(%[[ORIG_REF:.*]]: !fir.ref<complex<f32>>, %[[PVT_REF:.*]]: !fir.ref<complex<f32>>):
 !CHECK:    %[[ORIG_VAL:.*]] = fir.load %[[ORIG_REF]] : {{.*}}
 !CHECK:    hlfir.assign %[[ORIG_VAL]] to %[[PVT_REF]] {{.*}}
diff --git a/flang/test/Lower/OpenMP/same_var_first_lastprivate.f90 b/flang/test/Lower/OpenMP/same_var_first_lastprivate.f90
index e8e4a0802e00d2..ee914f23aacf32 100644
--- a/flang/test/Lower/OpenMP/same_var_first_lastprivate.f90
+++ b/flang/test/Lower/OpenMP/same_var_first_lastprivate.f90
@@ -10,11 +10,7 @@ subroutine first_and_lastprivate
   !$omp end parallel do
 end subroutine
 
-! CHECK:  omp.private {type = firstprivate} @{{.*}}Evar_firstprivate_ref_i32 : {{.*}} alloc {
-! CHECK:    %[[ALLOC:.*]] = fir.alloca i32 {{.*}}
-! CHECK:    %[[ALLOC_DECL:.*]]:2 = hlfir.declare %[[ALLOC]]
-! CHECK:    omp.yield(%[[ALLOC_DECL]]#0 : !fir.ref<i32>)
-! CHECK:  } copy {
+! CHECK:  omp.private {type = firstprivate} @{{.*}}Evar_firstprivate_i32 : {{.*}} copy {
 ! CHECK: ^{{.*}}(%[[ORIG_REF:.*]]: {{.*}}, %[[PRIV_REF:.*]]: {{.*}}):
 ! CHECK:    %[[ORIG_VAL:.*]] = fir.load %[[ORIG_REF]]
 ! CHECK:    hlfir.assign %[[ORIG_VAL]] to %[[PRIV_REF]]
@@ -25,7 +21,7 @@ subroutine first_and_lastprivate
 ! CHECK:    %[[ORIG_VAR_DECL:.*]]:2 = hlfir.declare %{{.*}} {uniq_name = "{{.*}}Evar"}
 ! CHECK:    omp.parallel {
 ! CHECK:      omp.barrier
-! CHECK:      omp.wsloop private(@{{.*}}var_firstprivate_ref_i32 {{.*}}) {
+! CHECK:      omp.wsloop private(@{{.*}}var_firstprivate_i32 {{.*}}) {
 ! CHECK:        omp.loop_nest {{.*}} {
 ! CHECK:          %[[PRIV_VAR_DECL:.*]]:2 = hlfir.declare %{{.*}} {uniq_name = "{{.*}}Evar"}
 ! CHECK:          fir.if %{{.*}} {
diff --git a/flang/test/Lower/OpenMP/simd.f90 b/flang/test/Lower/OpenMP/simd.f90
index 0345ace24aaa04..fc3d908801ff1b 100644
--- a/flang/test/Lower/OpenMP/simd.f90
+++ b/flang/test/Lower/OpenMP/simd.f90
@@ -254,7 +254,7 @@ subroutine lastprivate_with_simd
   real :: sum
 
   
-!CHECK: omp.simd private(@_QFlastprivate_with_simdEsum_private_ref_f32 %[[VAR_SUM_DECLARE]]#0 -> %[[VAR_SUM_PINNED:.*]], @{{.*}}) {
+!CHECK: omp.simd private(@_QFlastprivate_with_simdEsum_private_f32 %[[VAR_SUM_DECLARE]]#0 -> %[[VAR_SUM_PINNED:.*]], @{{.*}}) {
 !CHECK: omp.loop_nest (%[[ARG:.*]]) : i32 = ({{.*}} to ({{.*}}) inclusive step ({{.*}}) {
 !CHECK: %[[VAR_SUM_PINNED_DECLARE:.*]]:2 = hlfir.declare %[[VAR_SUM_PINNED]] {{.*}}
 !CHECK: %[[ADD_RESULT:.*]] = arith.addi {{.*}}
diff --git a/flang/test/Lower/OpenMP/task2.f90 b/flang/test/Lower/OpenMP/task2.f90
index 734e75c5bba06f..85f934f109aff9 100644
--- a/flang/test/Lower/OpenMP/task2.f90
+++ b/flang/test/Lower/OpenMP/task2.f90
@@ -2,7 +2,7 @@
 
 
 !CHECK-LABEL: omp.private
-!CHECK-SAME:      {type = firstprivate} @[[PRIVATIZER:.*]] : !fir.ref<!fir.box<!fir.heap<!fir.array<?xi32>>>> alloc {
+!CHECK-SAME:      {type = firstprivate} @[[PRIVATIZER:.*]] : !fir.box<!fir.heap<!fir.array<?xi32>>> init {
 !CHECK:         fir.if
 !CHECK:       } copy {
 !CHECK:         fir.if
diff --git a/flang/test/Transforms/generic-loop-rewriting.mlir b/flang/test/Transforms/generic-loop-rewriting.mlir
index a18ea9853602ac..38c20827828f91 100644
--- a/flang/test/Transforms/generic-loop-rewriting.mlir
+++ b/flang/test/Transforms/generic-loop-rewriting.mlir
@@ -1,9 +1,6 @@
 // RUN: fir-opt --omp-generic-loop-conversion %s | FileCheck %s
 
-omp.private {type = private} @_QFtarget_teams_loopEi_private_ref_i32 : !fir.ref<i32> alloc {
-^bb0(%arg0: !fir.ref<i32>):
-  omp.yield(%arg0 : !fir.ref<i32>)
-}
+omp.private {type = private} @_QFtarget_teams_loopEi_private_ref_i32 : i32
 
 func.func @_QPtarget_teams_loop() {
   %i = fir.alloca i32
diff --git a/flang/test/Transforms/omp-maps-for-privatized-symbols.fir b/flang/test/Transforms/omp-maps-for-privatized-symbols.fir
index d32444aaabf237..10a76126ed0547 100644
--- a/flang/test/Transforms/omp-maps-for-privatized-symbols.fir
+++ b/flang/test/Transforms/omp-maps-for-privatized-symbols.fir
@@ -1,12 +1,12 @@
 // RUN: fir-opt --split-input-file --omp-maps-for-privatized-symbols %s | FileCheck %s
 module attributes {omp.is_target_device = false} {
-  omp.private {type = private} @_QFtarget_simpleEsimple_var_private_ref_box_heap_i32 : !fir.ref<!fir.box<!fir.heap<i32>>> alloc {
-  ^bb0(%arg0: !fir.ref<!fir.box<!fir.heap<i32>>>):
-    %0 = fir.alloca !fir.box<!fir.heap<i32>> {bindc_name = "simple_var", pinned, uniq_name = "_QFtarget_simpleEsimple_var"}
-    %1 = fir.load %arg0 : !fir.ref<!fir.box<!fir.heap<i32>>>
-    %5:2 = hlfir.declare %0 {fortran_attrs = #fir.var_attrs<allocatable>, uniq_name = "_QFtarget_simpleEsimple_var"} : (!fir.ref<!fir.box<!fir.heap<i32>>>) -> (!fir.ref<!fir.box<!fir.heap<i32>>>, !fir.ref<!fir.box<!fir.heap<i32>>>)
-    omp.yield(%5#0 : !fir.ref<!fir.box<!fir.heap<i32>>>)
+  omp.private {type = private} @_QFtarget_simpleEsimple_var_private_ref_box_heap_i32 : !fir.box<!fir.heap<i32>> init {
+  ^bb0(%arg0: !fir.ref<!fir.box<!fir.heap<i32>>>, %arg1: !fir.ref<!fir.box<!fir.heap<i32>>>):
+    %mold = fir.load %arg0 : !fir.ref<!fir.box<!fir.heap<i32>>>
+    // extract box address, see if it is null, etc
+    omp.yield(%arg1: !fir.ref<!fir.box<!fir.heap<i32>>>)
   }
+
   func.func @_QPtarget_simple() {
     %0 = fir.alloca i32 {bindc_name = "a", uniq_name = "_QFtarget_simpleEa"}
     %1:2 = hlfir.declare %0 {uniq_name = "_QFtarget_simpleEa"} : (!fir.ref<i32>) -> (!fir.ref<i32>, !fir.ref<i32>)
diff --git a/mlir/include/mlir/Dialect/OpenMP/OpenMPOps.td b/mlir/include/mlir/Dialect/OpenMP/OpenMPOps.td
index c5b88904367086..a3a02124ec16bd 100644
--- a/mlir/include/mlir/Dialect/OpenMP/OpenMPOps.td
+++ b/mlir/include/mlir/Dialect/OpenMP/OpenMPOps.td
@@ -34,31 +34,27 @@ def PrivateClauseOp : OpenMP_Op<"private", [IsolatedFromAbove, RecipeInterface]>
   let description = [{
     This operation provides a declaration of how to implement the
     [first]privatization of a variable. The dialect users should provide
-    information about how to create an instance of the type in the alloc region,
-    how to initialize the copy from the original item in the copy region, and if
-    needed, how to deallocate allocated memory in the dealloc region.
+    which type should be allocated for this variable. The allocated (usually by
+    alloca) variable is passed to the initialization region which does everything
+    else (e.g. initialization of Fortran runtime descriptors). Information about
+    how to initialize the copy from the original item should be given in the
+    copy region, and if needed, how to deallocate memory (allocated by the
+    initialization region) in the dealloc region.
 
     Examples:
 
-    * `private(x)` would be emitted as:
+    * `private(x)` would not need any regions because no initialization is
+      required by the standard for i32 variables and this is not firstprivate.
     ```mlir
-    omp.private {type = private} @x.privatizer : !fir.ref<i32> alloc {
-    ^bb0(%arg0: !fir.ref<i32>):
-    %0 = ... allocate proper memory for the private clone ...
-    omp.yield(%0 : !fir.ref<i32>)
-    }
+    omp.private {type = private} @x.privatizer : i32
     ```
 
     * `firstprivate(x)` would be emitted as:
     ```mlir
-    omp.private {type = firstprivate} @x.privatizer : !fir.ref<i32> alloc {
-    ^bb0(%arg0: !fir.ref<i32>):
-    %0 = ... allocate proper memory for the private clone ...
-    omp.yield(%0 : !fir.ref<i32>)
-    } copy {
+    omp.private {type = firstprivate} @x.privatizer : i32 copy {
     ^bb0(%arg0: !fir.ref<i32>, %arg1: !fir.ref<i32>):
-    // %arg0 is the original host variable. Same as for `alloc`.
-    // %arg1 represents the memory allocated in `alloc`.
+    // %arg0 is the original host variable.
+    // %arg1 represents the memory allocated for this private variable.
     ... copy from host to the privatized clone ....
     omp.yield(%arg1 : !fir.ref<i32>)
     }
@@ -66,20 +62,20 @@ def PrivateClauseOp : OpenMP_Op<"private", [IsolatedFromAbove, RecipeInterface]>
 
     * `private(x)` for "allocatables" would be emitted as:
     ```mlir
-    omp.private {type = private} @x.privatizer : !some.type alloc {
-    ^bb0(%arg0: !some.type):
-    %0 = ... allocate proper memory for the private clone ...
-    omp.yield(%0 : !fir.ref<i32>)
+    omp.private {type = private} @x.privatizer : !some.type init {
+    ^bb0(%arg0: !some.pointer<!some.type>, %arg1: !some.pointer<!some.type>):
+    // initialize %arg1, using %arg0 as a mold for allocations
+    omp.yield(%arg1 : !some.pointer<!some.type>)
     } dealloc {
-    ^bb0(%arg0: !some.type):
+    ^bb0(%arg0: !some.pointer<!some.type>):
     ... deallocate allocated memory ...
     omp.yield
     }
     ```
 
     There are no restrictions on the body except for:
-    - The `alloc` & `dealloc` regions have a single argument.
-    - The `copy` region has 2 arguments.
+    - The `dealloc` regions has a single argument.
+    - The `init & `copy` regions have 2 arguments.
     - All three regions are terminated by `omp.yield` ops.
     The above restrictions and other obvious restrictions (e.g. verifying the
     type of yielded values) are verified by the custom op verifier. The actual
@@ -91,7 +87,10 @@ def PrivateClauseOp : OpenMP_Op<"private", [IsolatedFromAbove, RecipeInterface]>
     The $sym_name attribute provides a symbol by which the privatizer op can be
     referenced by other dialect ops.
 
-    The $type attribute is the type of the value being privatized.
+    The $type attribute is the type of the value being privatized. This type
+    will be implicitly allocated in MLIR->LLVMIR conversion and passed as the
+    second argument to the init region. Therefore the type of arguments to
+    the regions should be a type which represents a pointer to $type.
 
     The $data_sharing_type attribute specifies whether privatizer corresponds
     to a `private` or a `firstprivate` clause.
@@ -101,13 +100,13 @@ def PrivateClauseOp : OpenMP_Op<"private", [IsolatedFromAbove, RecipeInterface]>
                        TypeAttrOf<AnyType>:$type,
                        DataSharingClauseTypeAttr:$data_sharing_type);
 
-  let regions = (region MinSizedRegion<1>:$alloc_region,
+  let regions = (region AnyRegion:$init_region,
                         AnyRegion:$copy_region,
                         AnyRegion:$dealloc_region);
 
   let assemblyFormat = [{
     $data_sharing_type $sym_name `:` $type
-      `alloc` $alloc_region
+      (`init` $init_region^)?
       (`copy` $copy_region^)?
       (`dealloc` $dealloc_region^)?
       attr-dict
@@ -120,8 +119,13 @@ def PrivateClauseOp : OpenMP_Op<"private", [IsolatedFromAbove, RecipeInterface]>
   ];
 
   let extraClassDeclaration = [{
-    BlockArgument getAllocMoldArg() {
-      return getAllocRegion().getArgument(0);
+    BlockArgument getInitMoldArg() {
+      auto &region = getInitRegion();
+      return region.empty() ? nullptr : region.getArgument(0);
+    }
+    BlockArgument getInitPrivateArg() {
+      auto &region = getInitRegion();
+      return region.empty() ? nullptr : region.getArgument(1);
     }
     BlockArgument getCopyMoldArg() {
       auto &region = getCopyRegion();
@@ -141,7 +145,20 @@ def PrivateClauseOp : OpenMP_Op<"private", [IsolatedFromAbove, RecipeInterface]>
     /// when an allocatable is privatized. In such cases, the descriptor is used
     /// in privatization and needs to be mapped on to the device.
     bool needsMap() {
-      return !getAllocMoldArg().use_empty();
+      BlockArgument moldArg = getInitMoldArg();
+      return moldArg ? !moldArg.use_empty() : false;
+    }
+
+    /// Get the type for arguments to nested regions. This should
+    /// generally be either the same as getType() or some pointer
+    /// type (pointing to the type allocated by this op).
+    /// This method will return Type{nullptr} if there are no nested
+    /// regions.
+    Type getArgType() {
+      for (Region *region : getRegions())
+        for (Type ty : region->getArgumentTypes())
+          return ty;
+      return nullptr;
     }
   }];
 
diff --git a/mlir/lib/Conversion/OpenMPToLLVM/OpenMPToLLVM.cpp b/mlir/lib/Conversion/OpenMPToLLVM/OpenMPToLLVM.cpp
index 5d0003911bca87..0caf3ad1ccf017 100644
--- a/mlir/lib/Conversion/OpenMPToLLVM/OpenMPToLLVM.cpp
+++ b/mlir/lib/Conversion/OpenMPToLLVM/OpenMPToLLVM.cpp
@@ -235,10 +235,10 @@ void mlir::configureOpenMPToLLVMConversionLegality(
   target.addDynamicallyLegalOp<
       omp::AtomicUpdateOp, omp::CriticalOp, omp::DeclareReductionOp,
       omp::DistributeOp, omp::LoopNestOp, omp::LoopOp, omp::MasterOp,
-      omp::OrderedRegionOp, omp::ParallelOp, omp::PrivateClauseOp,
-      omp::SectionOp, omp::SectionsOp, omp::SimdOp, omp::SingleOp,
-      omp::TargetDataOp, omp::TargetOp, omp::TaskgroupOp, omp::TaskloopOp,
-      omp::TaskOp, omp::TeamsOp, omp::WsloopOp>([&](Operation *op) {
+      omp::OrderedRegionOp, omp::ParallelOp, omp::SectionOp, omp::SectionsOp,
+      omp::SimdOp, omp::SingleOp, omp::TargetDataOp, omp::TargetOp,
+      omp::TaskgroupOp, omp::TaskloopOp, omp::TaskOp, omp::TeamsOp,
+      omp::WsloopOp>([&](Operation *op) {
     return std::all_of(op->getRegions().begin(), op->getRegions().end(),
                        [&](Region &region) {
                          return typeConverter.isLegal(&region);
@@ -246,6 +246,16 @@ void mlir::configureOpenMPToLLVMConversionLegality(
            typeConverter.isLegal(op->getOperandTypes()) &&
            typeConverter.isLegal(op->getResultTypes());
   });
+  target.addDynamicallyLegalOp<omp::PrivateClauseOp>(
+      [&](omp::PrivateClauseOp op) -> bool {
+        return std::all_of(op->getRegions().begin(), op->getRegions().end(),
+                           [&](Region &region) {
+                             return typeConverter.isLegal(&region);
+                           }) &&
+               typeConverter.isLegal(op->getOperandTypes()) &&
+               typeConverter.isLegal(op->getResultTypes()) &&
+               typeConverter.isLegal(op.getType());
+      });
 }
 
 void mlir::populateOpenMPToLLVMConversionPatterns(LLVMTypeConverter &converter,
diff --git a/mlir/lib/Dialect/OpenMP/IR/OpenMPDialect.cpp b/mlir/lib/Dialect/OpenMP/IR/OpenMPDialect.cpp
index 5a619254a5ee14..596d25d881d3b3 100644
--- a/mlir/lib/Dialect/OpenMP/IR/OpenMPDialect.cpp
+++ b/mlir/lib/Dialect/OpenMP/IR/OpenMPDialect.cpp
@@ -1985,9 +1985,9 @@ static LogicalResult verifyPrivateVarList(OpType &op) {
       return op.emitError() << "failed to lookup privatizer op with symbol: '"
                             << privateSym << "'";
 
-    Type privatizerType = privatizerOp.getType();
+    Type privatizerType = privatizerOp.getArgType();
 
-    if (varType != privatizerType)
+    if (privatizerType && (varType != privatizerType))
       return op.emitError()
              << "type mismatch between a "
              << (privatizerOp.getDataSharingType() ==
@@ -3030,8 +3030,7 @@ void PrivateClauseOp::build(OpBuilder &odsBuilder, OperationState &odsState,
 }
 
 LogicalResult PrivateClauseOp::verifyRegions() {
-  Type symType = getType();
-
+  Type argType = getArgType();
   auto verifyTerminator = [&](Operation *terminator,
                               bool yieldsValue) -> LogicalResult {
     if (!terminator->getBlock()->getSuccessors().empty())
@@ -3052,11 +3051,11 @@ LogicalResult PrivateClauseOp::verifyRegions() {
              << "Did not expect any values to be yielded.";
     }
 
-    if (yieldedTypes.size() == 1 && yieldedTypes.front() == symType)
+    if (yieldedTypes.size() == 1 && yieldedTypes.front() == argType)
       return success();
 
     auto error = mlir::emitError(yieldOp.getLoc())
-                 << "Invalid yielded value. Expected type: " << symType
+                 << "Invalid yielded value. Expected type: " << argType
                  << ", got: ";
 
     if (yieldedTypes.empty())
@@ -3090,18 +3089,27 @@ LogicalResult PrivateClauseOp::verifyRegions() {
     return success();
   };
 
-  if (failed(verifyRegion(getAllocRegion(), /*expectedNumArgs=*/1, "alloc",
+  // Ensure all of the region arguments have the same type
+  for (Region *region : getRegions())
+    for (Type ty : region->getArgumentTypes())
+      if (ty != argType)
+        return emitError() << "Region argument type mismatch: got " << ty
+                           << " expected " << argType << ".";
+
+  mlir::Region &initRegion = getInitRegion();
+  if (!initRegion.empty() &&
+      failed(verifyRegion(getInitRegion(), /*expectedNumArgs=*/2, "init",
                           /*yieldsValue=*/true)))
     return failure();
 
   DataSharingClauseType dsType = getDataSharingType();
 
   if (dsType == DataSharingClauseType::Private && !getCopyRegion().empty())
-    return emitError("`private` clauses require only an `alloc` region.");
+    return emitError("`private` clauses do not require a `copy` region.");
 
   if (dsType == DataSharingClauseType::FirstPrivate && getCopyRegion().empty())
     return emitError(
-        "`firstprivate` clauses require both `alloc` and `copy` regions.");
+        "`firstprivate` clauses require at least a `copy` region.");
 
   if (dsType == DataSharingClauseType::FirstPrivate &&
       failed(verifyRegion(getCopyRegion(), /*expectedNumArgs=*/2, "copy",
diff --git a/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp b/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
index 29089cb28a5a8e..1f5bb3d01e682f 100644
--- a/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
+++ b/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
@@ -1338,15 +1338,14 @@ findAssociatedValue(Value privateVar, llvm::IRBuilderBase &builder,
 /// Allocate delayed private variables. Returns the basic block which comes
 /// after all of these allocations. llvm::Value * for each of these private
 /// variables are populated in llvmPrivateVars.
-static llvm::Expected<llvm::BasicBlock *>
-allocatePrivateVars(llvm::IRBuilderBase &builder,
-                    LLVM::ModuleTranslation &moduleTranslation,
-                    MutableArrayRef<BlockArgument> privateBlockArgs,
-                    MutableArrayRef<omp::PrivateClauseOp> privateDecls,
-                    MutableArrayRef<mlir::Value> mlirPrivateVars,
-                    llvm::SmallVectorImpl<llvm::Value *> &llvmPrivateVars,
-                    const llvm::OpenMPIRBuilder::InsertPointTy &allocaIP,
-                    llvm::DenseMap<Value, Value> *mappedPrivateVars = nullptr) {
+static llvm::Expected<llvm::BasicBlock *> allocateAndInitPrivateVars(
+    llvm::IRBuilderBase &builder, LLVM::ModuleTranslation &moduleTranslation,
+    MutableArrayRef<BlockArgument> privateBlockArgs,
+    MutableArrayRef<omp::PrivateClauseOp> privateDecls,
+    MutableArrayRef<mlir::Value> mlirPrivateVars,
+    llvm::SmallVectorImpl<llvm::Value *> &llvmPrivateVars,
+    const llvm::OpenMPIRBuilder::InsertPointTy &allocaIP,
+    llvm::DenseMap<Value, Value> *mappedPrivateVars = nullptr) {
   // Allocate private vars
   llvm::BranchInst *allocaTerminator =
       llvm::cast<llvm::BranchInst>(allocaIP.getBlock()->getTerminator());
@@ -1364,57 +1363,51 @@ allocatePrivateVars(llvm::IRBuilderBase &builder,
 
   llvm::BasicBlock *afterAllocas = allocaTerminator->getSuccessor(0);
 
-  // FIXME: Some of the allocation regions do more than just allocating.
-  // They read from their block argument (amongst other non-alloca things).
-  // When OpenMPIRBuilder outlines the parallel region into a different
-  // function it places the loads for live in-values (such as these block
-  // arguments) at the end of the entry block (because the entry block is
-  // assumed to contain only allocas). Therefore, if we put these complicated
-  // alloc blocks in the entry block, these will not dominate the availability
-  // of the live-in values they are using. Fix this by adding a latealloc
-  // block after the entry block to put these in (this also helps to avoid
-  // mixing non-alloca code with allocas).
-  // Alloc regions which do not use the block argument can still be placed in
-  // the entry block (therefore keeping the allocas together).
-  llvm::BasicBlock *privAllocBlock = nullptr;
+  llvm::BasicBlock *privInitBlock = nullptr;
   if (!privateBlockArgs.empty())
-    privAllocBlock = splitBB(builder, true, "omp.private.latealloc");
+    privInitBlock = splitBB(builder, true, "omp.private.init");
   for (auto [privDecl, mlirPrivVar, blockArg] :
        llvm::zip_equal(privateDecls, mlirPrivateVars, privateBlockArgs)) {
-    Region &allocRegion = privDecl.getAllocRegion();
+    llvm::Type *llvmAllocType =
+        moduleTranslation.convertType(privDecl.getType());
+    builder.SetInsertPoint(allocaIP.getBlock()->getTerminator());
+    llvm::Value *llvmPrivateVar = builder.CreateAlloca(
+        llvmAllocType, /*ArraySize=*/nullptr, "omp.private.alloc");
+
+    Region &initRegion = privDecl.getInitRegion();
+    if (initRegion.empty()) {
+      moduleTranslation.mapValue(blockArg, llvmPrivateVar);
+      llvmPrivateVars.push_back(llvmPrivateVar);
+      continue;
+    }
 
-    // map allocation region block argument
+    // map initialization region block arguments
     llvm::Value *nonPrivateVar = findAssociatedValue(
         mlirPrivVar, builder, moduleTranslation, mappedPrivateVars);
     assert(nonPrivateVar);
-    moduleTranslation.mapValue(privDecl.getAllocMoldArg(), nonPrivateVar);
+    moduleTranslation.mapValue(privDecl.getInitMoldArg(), nonPrivateVar);
+    moduleTranslation.mapValue(privDecl.getInitPrivateArg(), llvmPrivateVar);
 
-    // in-place convert the private allocation region
+    // in-place convert the private initialization region
     SmallVector<llvm::Value *, 1> phis;
-    if (privDecl.getAllocMoldArg().getUses().empty()) {
-      // TODO this should use
-      // allocaIP.getBlock()->getFirstNonPHIOrDbgOrAlloca() so it goes before
-      // the code for fetching the thread id. Not doing this for now to avoid
-      // test churn.
-      builder.SetInsertPoint(allocaIP.getBlock()->getTerminator());
-    } else {
-      builder.SetInsertPoint(privAllocBlock->getTerminator());
-    }
-
-    if (failed(inlineConvertOmpRegions(allocRegion, "omp.private.alloc",
-                                       builder, moduleTranslation, &phis)))
+    builder.SetInsertPoint(privInitBlock->getTerminator());
+    if (failed(inlineConvertOmpRegions(initRegion, "omp.private.init", builder,
+                                       moduleTranslation, &phis)))
       return llvm::createStringError(
-          "failed to inline `alloc` region of `omp.private`");
+          "failed to inline `init` region of `omp.private`");
 
     assert(phis.size() == 1 && "expected one allocation to be yielded");
 
+    // prefer the value yielded from the init region to the allocated private
+    // variable in case the region is operating on arguments by-value (e.g.
+    // Fortran character boxes).
     moduleTranslation.mapValue(blockArg, phis[0]);
     llvmPrivateVars.push_back(phis[0]);
 
-    // clear alloc region block argument mapping in case it needs to be
+    // clear init region block argument mapping in case it needs to be
     // re-created with a different source for another use of the same
     // reduction decl
-    moduleTranslation.forgetMapping(allocRegion);
+    moduleTranslation.forgetMapping(initRegion);
   }
   return afterAllocas;
 }
@@ -1758,9 +1751,10 @@ convertOmpTaskOp(omp::TaskOp taskOp, llvm::IRBuilderBase &builder,
     LLVM::ModuleTranslation::SaveStack<OpenMPAllocaStackFrame> frame(
         moduleTranslation, allocaIP);
 
-    llvm::Expected<llvm::BasicBlock *> afterAllocas = allocatePrivateVars(
-        builder, moduleTranslation, privateBlockArgs, privateDecls,
-        mlirPrivateVars, llvmPrivateVars, allocaIP);
+    llvm::Expected<llvm::BasicBlock *> afterAllocas =
+        allocateAndInitPrivateVars(builder, moduleTranslation, privateBlockArgs,
+                                   privateDecls, mlirPrivateVars,
+                                   llvmPrivateVars, allocaIP);
     if (handleError(afterAllocas, *taskOp).failed())
       return llvm::make_error<PreviouslyReportedError>();
 
@@ -1892,7 +1886,7 @@ convertOmpWsloop(Operation &opInst, llvm::IRBuilderBase &builder,
   SmallVector<llvm::Value *> privateReductionVariables(
       wsloopOp.getNumReductionVars());
 
-  llvm::Expected<llvm::BasicBlock *> afterAllocas = allocatePrivateVars(
+  llvm::Expected<llvm::BasicBlock *> afterAllocas = allocateAndInitPrivateVars(
       builder, moduleTranslation, privateBlockArgs, privateDecls,
       mlirPrivateVars, llvmPrivateVars, allocaIP);
   if (handleError(afterAllocas, opInst).failed())
@@ -2070,9 +2064,10 @@ convertOmpParallel(omp::ParallelOp opInst, llvm::IRBuilderBase &builder,
 
   auto bodyGenCB = [&](InsertPointTy allocaIP,
                        InsertPointTy codeGenIP) -> llvm::Error {
-    llvm::Expected<llvm::BasicBlock *> afterAllocas = allocatePrivateVars(
-        builder, moduleTranslation, privateBlockArgs, privateDecls,
-        mlirPrivateVars, llvmPrivateVars, allocaIP);
+    llvm::Expected<llvm::BasicBlock *> afterAllocas =
+        allocateAndInitPrivateVars(builder, moduleTranslation, privateBlockArgs,
+                                   privateDecls, mlirPrivateVars,
+                                   llvmPrivateVars, allocaIP);
     if (handleError(afterAllocas, *opInst).failed())
       return llvm::make_error<PreviouslyReportedError>();
 
@@ -2256,7 +2251,7 @@ convertOmpSimd(Operation &opInst, llvm::IRBuilderBase &builder,
       findAllocaInsertPoint(builder, moduleTranslation);
   llvm::OpenMPIRBuilder::LocationDescription ompLoc(builder);
 
-  llvm::Expected<llvm::BasicBlock *> afterAllocas = allocatePrivateVars(
+  llvm::Expected<llvm::BasicBlock *> afterAllocas = allocateAndInitPrivateVars(
       builder, moduleTranslation, privateBlockArgs, privateDecls,
       mlirPrivateVars, llvmPrivateVars, allocaIP);
   if (handleError(afterAllocas, opInst).failed())
@@ -4265,9 +4260,10 @@ convertOmpTarget(Operation &opInst, llvm::IRBuilderBase &builder,
     for (mlir::Value privateVar : targetOp.getPrivateVars())
       mlirPrivateVars.push_back(privateVar);
 
-    llvm::Expected<llvm::BasicBlock *> afterAllocas = allocatePrivateVars(
-        builder, moduleTranslation, privateBlockArgs, privateDecls,
-        mlirPrivateVars, llvmPrivateVars, allocaIP, &mappedPrivateVars);
+    llvm::Expected<llvm::BasicBlock *> afterAllocas =
+        allocateAndInitPrivateVars(
+            builder, moduleTranslation, privateBlockArgs, privateDecls,
+            mlirPrivateVars, llvmPrivateVars, allocaIP, &mappedPrivateVars);
 
     if (failed(handleError(afterAllocas, *targetOp)))
       return llvm::make_error<PreviouslyReportedError>();
diff --git a/mlir/test/Conversion/OpenMPToLLVM/convert-to-llvmir.mlir b/mlir/test/Conversion/OpenMPToLLVM/convert-to-llvmir.mlir
index 4f37dd16b44dd3..6f1ed73e778b43 100644
--- a/mlir/test/Conversion/OpenMPToLLVM/convert-to-llvmir.mlir
+++ b/mlir/test/Conversion/OpenMPToLLVM/convert-to-llvmir.mlir
@@ -507,28 +507,22 @@ llvm.func @_QPtarget_map_with_bounds(%arg0: !llvm.ptr, %arg1: !llvm.ptr, %arg2:
 
 // -----
 
-// CHECK: omp.private {type = private} @x.privatizer : !llvm.struct<{{.*}}> alloc {
-omp.private {type = private} @x.privatizer : memref<?xf32> alloc {
-// CHECK: ^bb0(%arg0: !llvm.struct<{{.*}}>):
-^bb0(%arg0: memref<?xf32>):
+// CHECK: omp.private {type = private} @x.privatizer : !llvm.struct<{{.*}}> init {
+omp.private {type = private} @x.privatizer : memref<?xf32> init {
+// CHECK: ^bb0(%arg0: !llvm.struct<{{.*}}>, %arg1: !llvm.struct<{{.*}}>):
+^bb0(%arg0: memref<?xf32>, %arg1: memref<?xf32>):
   // CHECK: omp.yield(%arg0 : !llvm.struct<{{.*}}>)
   omp.yield(%arg0 : memref<?xf32>)
 }
 
 // -----
 
-// CHECK: omp.private {type = firstprivate} @y.privatizer : i64 alloc {
-omp.private {type = firstprivate} @y.privatizer : index alloc {
-// CHECK: ^bb0(%arg0: i64):
-^bb0(%arg0: index):
-  // CHECK: omp.yield(%arg0 : i64)
-  omp.yield(%arg0 : index)
-// CHECK: } copy {
-} copy {
-// CHECK: ^bb0(%arg0: i64, %arg1: i64):
-^bb0(%arg0: index, %arg1: index):
-  // CHECK: omp.yield(%arg0 : i64)
-  omp.yield(%arg0 : index)
+// CHECK: omp.private {type = firstprivate} @y.privatizer : i64 copy {
+omp.private {type = firstprivate} @y.privatizer : index copy {
+// CHECK: ^bb0(%arg0: !llvm.ptr, %arg1: !llvm.ptr):
+^bb0(%arg0: !llvm.ptr, %arg1: !llvm.ptr):
+  // CHECK: omp.yield(%arg0 : !llvm.ptr)
+  omp.yield(%arg0 : !llvm.ptr)
 }
 
 // -----
diff --git a/mlir/test/Dialect/OpenMP/invalid.mlir b/mlir/test/Dialect/OpenMP/invalid.mlir
index c611614265592c..1b10ce030bbcc1 100644
--- a/mlir/test/Dialect/OpenMP/invalid.mlir
+++ b/mlir/test/Dialect/OpenMP/invalid.mlir
@@ -2331,8 +2331,8 @@ func.func @omp_distribute_unconstrained_order() -> () {
   return
 }
 // -----
-omp.private {type = private} @x.privatizer : i32 alloc {
-^bb0(%arg0: i32):
+omp.private {type = private} @x.privatizer : i32 init {
+^bb0(%arg0: i32, %arg1: i32):
   %0 = arith.constant 0.0 : f32
   // expected-error @below {{Invalid yielded value. Expected type: 'i32', got: 'f32'}}
   omp.yield(%0 : f32)
@@ -2340,17 +2340,17 @@ omp.private {type = private} @x.privatizer : i32 alloc {
 
 // -----
 
-omp.private {type = private} @x.privatizer : i32 alloc {
-^bb0(%arg0: i32):
-  // expected-error @below {{Invalid yielded value. Expected type: 'i32', got: None}}
+// expected-error @below {{Region argument type mismatch: got 'f32' expected 'i32'.}}
+omp.private {type = private} @x.privatizer : i32 init {
+^bb0(%arg0: i32, %arg1: f32):
   omp.yield
 }
 
 // -----
 
-omp.private {type = private} @x.privatizer : f32 alloc {
-^bb0(%arg0: f32):
-  omp.yield(%arg0 : f32)
+omp.private {type = private} @x.privatizer : f32 init {
+^bb0(%arg0: f32, %arg1: f32):
+  omp.yield(%arg0: f32)
 } dealloc {
 ^bb0(%arg0: f32):
   // expected-error @below {{Did not expect any values to be yielded.}}
@@ -2359,27 +2359,24 @@ omp.private {type = private} @x.privatizer : f32 alloc {
 
 // -----
 
-omp.private {type = private} @x.privatizer : i32 alloc {
-^bb0(%arg0: i32):
+omp.private {type = private} @x.privatizer : i32 init {
+^bb0(%arg0: i32, %arg1: i32):
   // expected-error @below {{expected exit block terminator to be an `omp.yield` op.}}
   omp.terminator
 }
 
 // -----
 
-// expected-error @below {{`alloc`: expected 1 region arguments, got: 2}}
-omp.private {type = private} @x.privatizer : f32 alloc {
-^bb0(%arg0: f32, %arg1: f32):
+// expected-error @below {{`init`: expected 2 region arguments, got: 1}}
+omp.private {type = private} @x.privatizer : f32 init {
+^bb0(%arg0: f32):
   omp.yield(%arg0 : f32)
 }
 
 // -----
 
 // expected-error @below {{`copy`: expected 2 region arguments, got: 1}}
-omp.private {type = firstprivate} @x.privatizer : f32 alloc {
-^bb0(%arg0: f32):
-  omp.yield(%arg0 : f32)
-} copy {
+omp.private {type = firstprivate} @x.privatizer : f32 copy {
 ^bb0(%arg0: f32):
   omp.yield(%arg0 : f32)
 }
@@ -2387,30 +2384,24 @@ omp.private {type = firstprivate} @x.privatizer : f32 alloc {
 // -----
 
 // expected-error @below {{`dealloc`: expected 1 region arguments, got: 2}}
-omp.private {type = private} @x.privatizer : f32 alloc {
-^bb0(%arg0: f32):
-  omp.yield(%arg0 : f32)
-} dealloc {
+omp.private {type = private} @x.privatizer : f32 dealloc {
 ^bb0(%arg0: f32, %arg1: f32):
   omp.yield
 }
 
 // -----
 
-// expected-error @below {{`private` clauses require only an `alloc` region.}}
-omp.private {type = private} @x.privatizer : f32 alloc {
-^bb0(%arg0: f32):
-  omp.yield(%arg0 : f32)
-} copy {
+// expected-error @below {{`private` clauses do not require a `copy` region.}}
+omp.private {type = private} @x.privatizer : f32 copy {
 ^bb0(%arg0: f32, %arg1 : f32):
   omp.yield(%arg0 : f32)
 }
 
 // -----
 
-// expected-error @below {{`firstprivate` clauses require both `alloc` and `copy` regions.}}
-omp.private {type = firstprivate} @x.privatizer : f32 alloc {
-^bb0(%arg0: f32):
+// expected-error @below {{`firstprivate` clauses require at least a `copy` region.}}
+omp.private {type = firstprivate} @x.privatizer : f32 init {
+^bb0(%arg0: f32, %arg1: f32):
   omp.yield(%arg0 : f32)
 }
 
@@ -2425,8 +2416,8 @@ func.func @private_type_mismatch(%arg0: index) {
   return
 }
 
-omp.private {type = private} @var1.privatizer : !llvm.ptr alloc {
-^bb0(%arg0: !llvm.ptr):
+omp.private {type = private} @var1.privatizer : index init {
+^bb0(%arg0: !llvm.ptr, %arg1: !llvm.ptr):
   omp.yield(%arg0 : !llvm.ptr)
 }
 
@@ -2441,10 +2432,7 @@ func.func @firstprivate_type_mismatch(%arg0: index) {
   return
 }
 
-omp.private {type = firstprivate} @var1.privatizer : !llvm.ptr alloc {
-^bb0(%arg0: !llvm.ptr):
-  omp.yield(%arg0 : !llvm.ptr)
-} copy {
+omp.private {type = firstprivate} @var1.privatizer : index copy {
 ^bb0(%arg0: !llvm.ptr, %arg1: !llvm.ptr):
   omp.yield(%arg0 : !llvm.ptr)
 }
@@ -2472,10 +2460,7 @@ func.func @undefined_privatizer(%arg0: !llvm.ptr) {
 
 // -----
 
-omp.private {type = private} @var1.privatizer : !llvm.ptr alloc {
-^bb0(%arg0: !llvm.ptr):
-  omp.yield(%arg0 : !llvm.ptr)
-} copy {
+omp.private {type = private} @var1.privatizer : !llvm.ptr copy {
 ^bb0(%arg0: !llvm.ptr, %arg1: !llvm.ptr):
   omp.yield(%arg0 : !llvm.ptr)
 }
diff --git a/mlir/test/Dialect/OpenMP/ops.mlir b/mlir/test/Dialect/OpenMP/ops.mlir
index b1901c333ade8d..c2459222a2be1e 100644
--- a/mlir/test/Dialect/OpenMP/ops.mlir
+++ b/mlir/test/Dialect/OpenMP/ops.mlir
@@ -2662,34 +2662,31 @@ func.func @parallel_op_privatizers(%arg0: !llvm.ptr, %arg1: !llvm.ptr) {
   return
 }
 
-// CHECK-LABEL: omp.private {type = private} @a.privatizer : !llvm.ptr alloc {
-omp.private {type = private} @a.privatizer : !llvm.ptr alloc {
-// CHECK: ^bb0(%{{.*}}: {{.*}}):
-^bb0(%arg0: !llvm.ptr):
+// CHECK-LABEL: omp.private {type = private} @a.privatizer : !llvm.ptr init {
+omp.private {type = private} @a.privatizer : !llvm.ptr init {
+// CHECK: ^bb0(%{{.*}}: {{.*}}, %{{.*}}: {{.*}}):
+^bb0(%arg0: !llvm.ptr, %arg1: !llvm.ptr):
   omp.yield(%arg0 : !llvm.ptr)
 }
 
-// CHECK-LABEL: omp.private {type = private} @x.privatizer : !llvm.ptr alloc {
-omp.private {type = private} @x.privatizer : !llvm.ptr alloc {
-// CHECK: ^bb0(%{{.*}}: {{.*}}):
-^bb0(%arg0: !llvm.ptr):
+// CHECK-LABEL: omp.private {type = private} @x.privatizer : !llvm.ptr init {
+omp.private {type = private} @x.privatizer : !llvm.ptr init {
+// CHECK: ^bb0(%{{.*}}: {{.*}}, %{{.*}}: {{.*}}):
+^bb0(%arg0: !llvm.ptr, %arg1: !llvm.ptr):
   omp.yield(%arg0 : !llvm.ptr)
+// CHECK: } dealloc {
 } dealloc {
 // CHECK: ^bb0(%{{.*}}: {{.*}}):
 ^bb0(%arg0: !llvm.ptr):
   omp.yield
 }
 
-// CHECK-LABEL: omp.private {type = firstprivate} @y.privatizer : !llvm.ptr alloc {
-omp.private {type = firstprivate} @y.privatizer : !llvm.ptr alloc {
-// CHECK: ^bb0(%{{.*}}: {{.*}}):
-^bb0(%arg0: !llvm.ptr):
-  omp.yield(%arg0 : !llvm.ptr)
-// CHECK: } copy {
-} copy {
+// CHECK-LABEL: omp.private {type = firstprivate} @y.privatizer : !llvm.ptr copy {
+omp.private {type = firstprivate} @y.privatizer : !llvm.ptr copy {
 // CHECK: ^bb0(%{{.*}}: {{.*}}, %{{.*}}: {{.*}}):
 ^bb0(%arg0: !llvm.ptr, %arg1: !llvm.ptr):
   omp.yield(%arg0 : !llvm.ptr)
+// CHECK: } dealloc {
 } dealloc {
 // CHECK: ^bb0(%{{.*}}: {{.*}}):
 ^bb0(%arg0: !llvm.ptr):
diff --git a/mlir/test/Target/LLVMIR/openmp-firstprivate.mlir b/mlir/test/Target/LLVMIR/openmp-firstprivate.mlir
index 79412fb69f7583..4e27640b478e43 100644
--- a/mlir/test/Target/LLVMIR/openmp-firstprivate.mlir
+++ b/mlir/test/Target/LLVMIR/openmp-firstprivate.mlir
@@ -11,12 +11,7 @@ llvm.func @parallel_op_firstprivate(%arg0: !llvm.ptr) {
   llvm.return
 }
 
-omp.private {type = firstprivate} @x.privatizer : !llvm.ptr alloc {
-^bb0(%arg0: !llvm.ptr):
-  %c1 = llvm.mlir.constant(1 : i32) : i32
-  %0 = llvm.alloca %c1 x f32 : (i32) -> !llvm.ptr
-  omp.yield(%0 : !llvm.ptr)
-} copy {
+omp.private {type = firstprivate} @x.privatizer : f32 copy {
 ^bb0(%arg0: !llvm.ptr, %arg1: !llvm.ptr):
   %0 = llvm.load %arg0 : !llvm.ptr -> f32
   llvm.store %0, %arg1 : f32, !llvm.ptr
@@ -63,6 +58,7 @@ llvm.func @parallel_op_firstprivate_multi_block(%arg0: !llvm.ptr) {
 // CHECK: omp.par.entry:
 // CHECK:  %[[ORIG_PTR_PTR:.*]] = getelementptr { ptr }, ptr %{{.*}}, i32 0, i32 0
 // CHECK:  %[[ORIG_PTR:.*]] = load ptr, ptr %[[ORIG_PTR_PTR]], align 8
+// CHECK:  %[[PRIV_ALLOC:.*]] = alloca float, align 4
 // CHECK:   br label %[[PRIV_BB1:.*]]
 
 // CHECK: [[PRIV_BB1]]:
@@ -72,28 +68,26 @@ llvm.func @parallel_op_firstprivate_multi_block(%arg0: !llvm.ptr) {
 // CHECK-NEXT: br label %[[PRIV_BB2:.*]]
 
 // CHECK: [[PRIV_BB2]]:
-// CHECK-NEXT: %[[C1:.*]] = phi i32 [ 1, %[[PRIV_BB1]] ]
-// CHECK-NEXT: %[[PRIV_ALLOC:.*]] = alloca float, i32 %[[C1]], align 4
+// CHECK-NEXT: br label %[[PRIV_BB3:.*]]
+
+// CHECK: [[PRIV_BB3]]:
 // CHECK-NEXT: br label %omp.region.cont
 
 // CHECK: omp.region.cont:
-// CHECK-NEXT: %[[PRIV_ALLOC2:.*]] = phi ptr [ %[[PRIV_ALLOC]], %[[PRIV_BB2]] ]
-// CHECK-NEXT: br label %omp.private.latealloc
-
-// CHECK: omp.private.latealloc:
+// CHECK-NEXT: %[[PRIV_ALLOC2:.*]] = phi ptr [ %[[PRIV_ALLOC]], %[[PRIV_BB3]] ]
 // CHECK-NEXT: br label %omp.private.copy
 
 // CHECK: omp.private.copy:
-// CHECK-NEXT: br label %omp.private.copy3
+// CHECK-NEXT: br label %omp.private.copy4
 
-// CHECK: omp.private.copy3:
+// CHECK: omp.private.copy4:
 // CHECK-NEXT: %[[ORIG_VAL:.*]] = load float, ptr %[[ORIG_PTR]], align 4
 // CHECK-NEXT: br label %[[PRIV_BB3:.*]]
 
 // Check contents of the 2nd block in the `copy` region.
 // CHECK: [[PRIV_BB3]]:
-// CHECK-NEXT: %[[ORIG_VAL2:.*]] = phi float [ %[[ORIG_VAL]], %omp.private.copy3 ]
-// CHECK-NEXT: %[[PRIV_ALLOC3:.*]] = phi ptr [ %[[PRIV_ALLOC2]], %omp.private.copy3 ]
+// CHECK-NEXT: %[[ORIG_VAL2:.*]] = phi float [ %[[ORIG_VAL]], %omp.private.copy4 ]
+// CHECK-NEXT: %[[PRIV_ALLOC3:.*]] = phi ptr [ %[[PRIV_ALLOC2]], %omp.private.copy4 ]
 // CHECK-NEXT: store float %[[ORIG_VAL2]], ptr %[[PRIV_ALLOC3]], align 4
 // CHECK-NEXT: br label %[[PRIV_CONT:.*]]
 
@@ -107,14 +101,12 @@ llvm.func @parallel_op_firstprivate_multi_block(%arg0: !llvm.ptr) {
 // CHECK: [[PAR_REG]]:
 // CHECK:        %{{.*}} = load float, ptr %[[PRIV_ALLOC2]], align 4
 
-omp.private {type = firstprivate} @multi_block.privatizer : !llvm.ptr alloc {
-^bb0(%arg0: !llvm.ptr):
-  %c1 = llvm.mlir.constant(1 : i32) : i32
-  llvm.br ^bb1(%c1 : i32)
+omp.private {type = firstprivate} @multi_block.privatizer : f32 init {
+^bb0(%arg0: !llvm.ptr, %arg1: !llvm.ptr):
+  llvm.br ^bb1
 
-^bb1(%arg1: i32):
-  %0 = llvm.alloca %arg1 x f32 : (i32) -> !llvm.ptr
-  omp.yield(%0 : !llvm.ptr)
+^bb1:
+  omp.yield(%arg1 : !llvm.ptr)
 
 } copy {
 ^bb0(%arg0: !llvm.ptr, %arg1: !llvm.ptr):
@@ -145,8 +137,8 @@ llvm.func @non_const_len_char_test(%n: !llvm.ptr {fir.bindc_name = "n"}) {
   llvm.return
 }
 
-omp.private {type = firstprivate} @non_const_len_char : !llvm.struct<(ptr, i64)> alloc {
-^bb0(%orig_val: !llvm.struct<(ptr, i64)>):
+omp.private {type = firstprivate} @non_const_len_char : !llvm.struct<(ptr, i64)> init {
+^bb0(%orig_val: !llvm.struct<(ptr, i64)>, %arg1: !llvm.struct<(ptr, i64)>):
   %str_len = llvm.extractvalue %orig_val[1] : !llvm.struct<(ptr, i64)>
   %priv_alloc = llvm.alloca %str_len x i8 {bindc_name = "str", pinned} : (i64) -> !llvm.ptr
   %priv_val = llvm.mlir.undef : !llvm.struct<(ptr, i64)>
@@ -179,12 +171,7 @@ llvm.func @foo()
 // global), then we miss finding that input and we do not privatize the
 // variable.
 
-omp.private {type = firstprivate} @global_privatizer : !llvm.ptr alloc {
-^bb0(%arg0: !llvm.ptr):
-  %0 = llvm.mlir.constant(1 : i64) : i64
-  %1 = llvm.alloca %0 x f32 {bindc_name = "global", pinned} : (i64) -> !llvm.ptr
-  omp.yield(%1 : !llvm.ptr)
-} copy {
+omp.private {type = firstprivate} @global_privatizer : f32 copy {
 ^bb0(%arg0: !llvm.ptr, %arg1: !llvm.ptr):
   %0 = llvm.load %arg0 : !llvm.ptr -> f32
   llvm.store %0, %arg1 : f32, !llvm.ptr
@@ -210,6 +197,6 @@ llvm.mlir.global internal @global() {addr_space = 0 : i32} : f32 {
 // CHECK-NEXT:  omp.par.entry:
 // Verify that we found the privatizer by checking that we properly inlined the
 // bodies of the alloc and copy regions.
-// CHECK:         %[[PRIV_ALLOC:.*]] = alloca float, i64 1, align 4
+// CHECK:         %[[PRIV_ALLOC:.*]] = alloca float, align 4
 // CHECK:         %[[GLOB_VAL:.*]] = load float, ptr @global, align 4
 // CHECK:         store float %[[GLOB_VAL]], ptr %[[PRIV_ALLOC]], align 4
diff --git a/mlir/test/Target/LLVMIR/openmp-llvm.mlir b/mlir/test/Target/LLVMIR/openmp-llvm.mlir
index 9868ef227d49e0..8a95793b96fd53 100644
--- a/mlir/test/Target/LLVMIR/openmp-llvm.mlir
+++ b/mlir/test/Target/LLVMIR/openmp-llvm.mlir
@@ -2801,12 +2801,7 @@ llvm.func @par_task_(%arg0: !llvm.ptr {fir.bindc_name = "a"}) {
 llvm.func @foo(!llvm.ptr) -> ()
 llvm.func @destroy(!llvm.ptr) -> ()
 
-omp.private {type = firstprivate} @privatizer : !llvm.ptr alloc {
-^bb0(%arg0 : !llvm.ptr):
-  %0 = llvm.mlir.constant(1 : i64) : i64
-  %1 = llvm.alloca %0 x i32 : (i64) -> !llvm.ptr
-  omp.yield(%1 : !llvm.ptr)
-} copy {
+omp.private {type = firstprivate} @privatizer : i32 copy {
 ^bb0(%arg0: !llvm.ptr, %arg1: !llvm.ptr):
   %0 = llvm.load %arg0 : !llvm.ptr -> i32
   llvm.store %0, %arg1 : i32, !llvm.ptr
@@ -2829,11 +2824,11 @@ llvm.func @task(%arg0 : !llvm.ptr) {
 // CHECK:         %[[VAL_11:.*]] = load ptr, ptr %[[VAL_12:.*]], align 8
 // CHECK:         %[[VAL_13:.*]] = getelementptr { ptr }, ptr %[[VAL_11]], i32 0, i32 0
 // CHECK:         %[[VAL_14:.*]] = load ptr, ptr %[[VAL_13]], align 8
-// CHECK:         %[[VAL_15:.*]] = alloca i32, i64 1, align 4
-// CHECK:         br label %omp.private.latealloc
-// CHECK:       omp.private.latealloc:                            ; preds = %task.alloca
+// CHECK:         %[[VAL_15:.*]] = alloca i32, align 4
+// CHECK:         br label %omp.private.init
+// CHECK:       omp.private.init:                                 ; preds = %task.alloca
 // CHECK:         br label %omp.private.copy
-// CHECK:       omp.private.copy:                                 ; preds = %omp.private.latealloc
+// CHECK:       omp.private.copy:                                 ; preds = %omp.private.init
 // CHECK:         %[[VAL_19:.*]] = load i32, ptr %[[VAL_14]], align 4
 // CHECK:         store i32 %[[VAL_19]], ptr %[[VAL_15]], align 4
 // CHECK:         br label %[[VAL_20:.*]]
diff --git a/mlir/test/Target/LLVMIR/openmp-omp.private-dealloc.mlir b/mlir/test/Target/LLVMIR/openmp-omp.private-dealloc.mlir
index 835caccb262c46..97f4980b1782fb 100644
--- a/mlir/test/Target/LLVMIR/openmp-omp.private-dealloc.mlir
+++ b/mlir/test/Target/LLVMIR/openmp-omp.private-dealloc.mlir
@@ -10,12 +10,7 @@ llvm.func @parallel_op_dealloc(%arg0: !llvm.ptr) {
   llvm.return
 }
 
-omp.private {type = firstprivate} @x.privatizer : !llvm.ptr alloc {
-^bb0(%arg0: !llvm.ptr):
-  %c1 = llvm.mlir.constant(1 : i32) : i32
-  %0 = llvm.alloca %c1 x f32 : (i32) -> !llvm.ptr
-  omp.yield(%0 : !llvm.ptr)
-} copy {
+omp.private {type = firstprivate} @x.privatizer : f32 copy {
 ^bb0(%arg0: !llvm.ptr, %arg1: !llvm.ptr):
   %0 = llvm.load %arg0 : !llvm.ptr -> f32
   llvm.store %0, %arg1 : f32, !llvm.ptr
diff --git a/mlir/test/Target/LLVMIR/openmp-private.mlir b/mlir/test/Target/LLVMIR/openmp-private.mlir
index d2ca03a8fa027a..07f8416579a58d 100644
--- a/mlir/test/Target/LLVMIR/openmp-private.mlir
+++ b/mlir/test/Target/LLVMIR/openmp-private.mlir
@@ -60,9 +60,9 @@ llvm.func @parallel_op_2_privates(%arg0: !llvm.ptr, %arg1: !llvm.ptr) {
 
 // Check that the privatizer alloc region was inlined properly.
 // CHECK: %[[PRIV1_ALLOC:.*]] = alloca float, align 4
+// CHECK: %[[PRIV2_ALLOC:.*]] = alloca i32, align 4
 // CHECK: %[[ORIG1_VAL:.*]] = load float, ptr %[[ORIG1_PTR]], align 4
 // CHECK: store float %[[ORIG1_VAL]], ptr %[[PRIV1_ALLOC]], align 4
-// CHECK: %[[PRIV2_ALLOC:.*]] = alloca i32, align 4
 // CHECK: %[[ORIG2_VAL:.*]] = load i32, ptr %[[ORIG2_PTR]], align 4
 // CHECK: store i32 %[[ORIG2_VAL]], ptr %[[PRIV2_ALLOC]], align 4
 // CHECK-NEXT: br
@@ -72,22 +72,20 @@ llvm.func @parallel_op_2_privates(%arg0: !llvm.ptr, %arg1: !llvm.ptr) {
 // CHECK: load i32, ptr %[[PRIV2_ALLOC]], align 4
 // CHECK: }
 
-omp.private {type = private} @x.privatizer : !llvm.ptr alloc {
-^bb0(%arg0: !llvm.ptr):
+omp.private {type = private} @x.privatizer : f32 init {
+^bb0(%arg0: !llvm.ptr, %arg1: !llvm.ptr):
   %c1 = llvm.mlir.constant(1 : i32) : i32
-  %0 = llvm.alloca %c1 x f32 : (i32) -> !llvm.ptr
   %1 = llvm.load %arg0 : !llvm.ptr -> f32
-  llvm.store %1, %0 : f32, !llvm.ptr
-  omp.yield(%0 : !llvm.ptr)
+  llvm.store %1, %arg1 : f32, !llvm.ptr
+  omp.yield(%arg1 : !llvm.ptr)
 }
 
-omp.private {type = private} @y.privatizer : !llvm.ptr alloc {
-^bb0(%arg0: !llvm.ptr):
+omp.private {type = private} @y.privatizer : i32 init {
+^bb0(%arg0: !llvm.ptr, %arg1: !llvm.ptr):
   %c1 = llvm.mlir.constant(1 : i32) : i32
-  %0 = llvm.alloca %c1 x i32 : (i32) -> !llvm.ptr
   %1 = llvm.load %arg0 : !llvm.ptr -> i32
-  llvm.store %1, %0 : i32, !llvm.ptr
-  omp.yield(%0 : !llvm.ptr)
+  llvm.store %1, %arg1 : i32, !llvm.ptr
+  omp.yield(%arg1 : !llvm.ptr)
 }
 
 // -----
@@ -104,17 +102,17 @@ llvm.func @parallel_op_private_multi_block(%arg0: !llvm.ptr) {
 // CHECK: omp.par.entry:
 // CHECK:  %[[ORIG_PTR_PTR:.*]] = getelementptr { ptr }, ptr %{{.*}}, i32 0, i32 0
 // CHECK:  %[[ORIG_PTR:.*]] = load ptr, ptr %[[ORIG_PTR_PTR]], align 8
-// CHECK:  br label %omp.private.latealloc
+// CHECK:  %[[PRIV_ALLOC:.*]] = alloca float, align 4
+// CHECK:  br label %omp.private.init
 
-// CHECK: omp.private.latealloc:
+// CHECK: omp.private.init:
 // CHECK:   br label %[[PRIV_BB1:.*]]
 
-// Check contents of the first block in the `alloc` region.
+// Check contents of the first block in the `init` region.
 // CHECK: [[PRIV_BB1]]:
-// CHECK-NEXT:   %[[PRIV_ALLOC:.*]] = alloca float, align 4
 // CHECK-NEXT:   br label %[[PRIV_BB2:.*]]
 
-// Check contents of the second block in the `alloc` region.
+// Check contents of the second block in the `init` region.
 // CHECK: [[PRIV_BB2]]:
 // CHECK-NEXT:   %[[ORIG_PTR2:.*]] = phi ptr [ %[[ORIG_PTR]], %[[PRIV_BB1]] ]
 // CHECK-NEXT:   %[[PRIV_ALLOC2:.*]] = phi ptr [ %[[PRIV_ALLOC]], %[[PRIV_BB1]] ]
@@ -125,22 +123,23 @@ llvm.func @parallel_op_private_multi_block(%arg0: !llvm.ptr) {
 // Check that the privatizer's continuation block yileds the private clone's
 // address.
 // CHECK: [[PRIV_CONT]]:
-// CHECK-NEXT:   %[[PRIV_ALLOC3:.*]] = phi ptr [ %[[PRIV_ALLOC2]], %[[PRIV_BB2]] ]
+// CHECK-NEXT:   %[[ORIG_PTR3:.*]] = phi ptr [ %[[ORIG_PTR2]], %[[PRIV_BB2]] ]
 // CHECK-NEXT:   br label %[[PAR_REG:.*]]
 
-// Check that the body of the parallel region loads from the private clone.
 // CHECK: [[PAR_REG]]:
-// CHECK:        %{{.*}} = load float, ptr %[[PRIV_ALLOC3]], align 4
+// CHECK-NEXT:   br label %[[PAR_REG2:.*]]
 
-omp.private {type = private} @multi_block.privatizer : !llvm.ptr alloc {
-^bb0(%arg0: !llvm.ptr):
-  %c1 = llvm.mlir.constant(1 : i32) : i32
-  %0 = llvm.alloca %c1 x f32 : (i32) -> !llvm.ptr
-  llvm.br ^bb1(%arg0, %0 : !llvm.ptr, !llvm.ptr)
+// Check that the body of the parallel region loads from the private clone.
+// CHECK: [[PAR_REG2]]:
+// CHECK:        %{{.*}} = load float, ptr %[[ORIG_PTR3]], align 4
+
+omp.private {type = private} @multi_block.privatizer : f32 init {
+^bb0(%arg0: !llvm.ptr, %arg1: !llvm.ptr):
+  llvm.br ^bb1(%arg0, %arg1 : !llvm.ptr, !llvm.ptr)
 
-^bb1(%arg1: !llvm.ptr, %arg2: !llvm.ptr):
-  %1 = llvm.load %arg1 : !llvm.ptr -> f32
-  llvm.store %1, %arg2 : f32, !llvm.ptr
+^bb1(%arg2: !llvm.ptr, %arg3: !llvm.ptr):
+  %1 = llvm.load %arg2 : !llvm.ptr -> f32
+  llvm.store %1, %arg3 : f32, !llvm.ptr
   omp.yield(%arg2 : !llvm.ptr)
 }
 
@@ -174,8 +173,8 @@ llvm.func @lower_region_with_addressof() {
   llvm.return
 }
 
-omp.private {type = private} @_QFlower_region_with_addressof_privatizer : !llvm.ptr alloc {
-^bb0(%arg0: !llvm.ptr):
+omp.private {type = private} @_QFlower_region_with_addressof_privatizer : !llvm.ptr init {
+^bb0(%arg0: !llvm.ptr, %arg1: !llvm.ptr):
   %0 = llvm.mlir.addressof @_QQfoo: !llvm.ptr
   omp.yield(%0 : !llvm.ptr)
 }
@@ -195,7 +194,7 @@ llvm.func @bar(!llvm.ptr)
 // that we access the different sets of args properly.
 
 // CHECK-LABEL: define internal void @private_and_reduction_..omp_par
-// CHECK-DAG:    %[[PRV_ALLOC:.*]] = alloca float, i64 1, align 4
+// CHECK-DAG:    %[[PRV_ALLOC:.*]] = alloca float, align 4
 // CHECK-DAG:     %[[RED_ALLOC:.*]] = alloca { ptr, i64, i32, i8, i8, i8, i8, [1 x [3 x i64]] }, i64 1, align 8
 
 // CHECK:         omp.par.region:
@@ -219,12 +218,7 @@ llvm.func @private_and_reduction_() attributes {fir.internal_name = "_QPprivate_
   llvm.return
 }
 
-omp.private {type = private} @privatizer.part : !llvm.ptr alloc {
-^bb0(%arg0: !llvm.ptr):
-  %0 = llvm.mlir.constant(1 : i64) : i64
-  %1 = llvm.alloca %0 x f32 {bindc_name = "to_priv", pinned} : (i64) -> !llvm.ptr
-  omp.yield(%1 : !llvm.ptr)
-}
+omp.private {type = private} @privatizer.part : f32
 
 omp.declare_reduction @reducer.part : !llvm.ptr alloc {
   %0 = llvm.mlir.constant(1 : i64) : i64
@@ -256,12 +250,7 @@ llvm.func @_QPequivalence() {
   llvm.return
 }
 
-omp.private {type = firstprivate} @_QFequivalenceEx_firstprivate_ptr_f32 : !llvm.ptr alloc {
-^bb0(%arg0: !llvm.ptr):
-  %0 = llvm.mlir.constant(1 : i64) : i64
-  %1 = llvm.alloca %0 x f32 {bindc_name = "x", pinned} : (i64) -> !llvm.ptr
-  omp.yield(%1 : !llvm.ptr)
-} copy {
+omp.private {type = firstprivate} @_QFequivalenceEx_firstprivate_ptr_f32 : f32 copy {
 ^bb0(%arg0: !llvm.ptr, %arg1: !llvm.ptr):
   %0 = llvm.load %arg0 : !llvm.ptr -> f32
   llvm.store %0, %arg1 : f32, !llvm.ptr
@@ -270,7 +259,7 @@ omp.private {type = firstprivate} @_QFequivalenceEx_firstprivate_ptr_f32 : !llvm
 
 // CHECK: define internal void @_QPequivalence..omp_par
 // CHECK-NOT: define {{.*}} @{{.*}}
-// CHECK:   %[[PRIV_ALLOC:.*]] = alloca float, i64 1, align 4
+// CHECK:   %[[PRIV_ALLOC:.*]] = alloca float, align 4
 // CHECK:   %[[HOST_VAL:.*]] = load float, ptr %{{.*}}, align 4
 // Test that we initialize the firstprivate variable.
 // CHECK:   store float %[[HOST_VAL]], ptr %[[PRIV_ALLOC]], align 4
diff --git a/mlir/test/Target/LLVMIR/openmp-simd-private.mlir b/mlir/test/Target/LLVMIR/openmp-simd-private.mlir
index 61542aa1aa4d77..40f46103a0ab44 100644
--- a/mlir/test/Target/LLVMIR/openmp-simd-private.mlir
+++ b/mlir/test/Target/LLVMIR/openmp-simd-private.mlir
@@ -1,17 +1,12 @@
 // RUN: mlir-translate -mlir-to-llvmir -split-input-file %s | FileCheck %s
 
-omp.private {type = private} @i_privatizer : !llvm.ptr alloc {
-^bb0(%arg0: !llvm.ptr):
-  %0 = llvm.mlir.constant(1 : i64) : i64
-  %1 = llvm.alloca %0 x i32 {bindc_name = "i", pinned} : (i64) -> !llvm.ptr
-  omp.yield(%1 : !llvm.ptr)
-}
+omp.private {type = private} @i_privatizer : i32
 
 // CHECK-LABEL: test_loop_var_privatization()
 //                Original (non-privatized) allocation for `i`.
 // CHECK:         %{{.*}} = alloca i32, i64 1, align 4
 // CHECK:         %[[DUMMY:.*]] = alloca float, i64 1, align 4
-// CHECK:         %[[PRIV_I:.*]] = alloca i32, i64 1, align 4
+// CHECK:         %[[PRIV_I:.*]] = alloca i32, align 4
 // CHECK:         br label %[[LATE_ALLOC:.*]]
 
 // CHECK:     [[LATE_ALLOC]]:
@@ -78,20 +73,15 @@ llvm.func @test_loop_var_privatization() attributes {fir.internal_name = "_QPtes
   llvm.return
 }
 
-omp.private {type = private} @dummy_privatizer : !llvm.ptr alloc {
-^bb0(%arg0: !llvm.ptr):
-  %0 = llvm.mlir.constant(1 : i64) : i64
-  %1 = llvm.alloca %0 x f32 {bindc_name = "dummy", pinned} : (i64) -> !llvm.ptr
-  omp.yield(%1 : !llvm.ptr)
-}
+omp.private {type = private} @dummy_privatizer : f32
 
 // CHECK-LABEL: test_private_clause()
 //                Original (non-privatized) allocation for `i`.
 // CHECK:         %{{.*}} = alloca i32, i64 1, align 4
 //                Original (non-privatized) allocation for `dummy`.
 // CHECK:         %{{.*}} = alloca float, i64 1, align 4
-// CHECK:         %[[PRIV_DUMMY:.*]] = alloca float, i64 1, align 4
-// CHECK:         %[[PRIV_I:.*]] = alloca i32, i64 1, align 4
+// CHECK:         %[[PRIV_DUMMY:.*]] = alloca float, align 4
+// CHECK:         %[[PRIV_I:.*]] = alloca i32, align 4
 
 // CHECK:       omp.simd.region:
 // CHECK-NOT:     br label
diff --git a/mlir/test/Target/LLVMIR/openmp-target-multiple-private.mlir b/mlir/test/Target/LLVMIR/openmp-target-multiple-private.mlir
index c632a0ee42f8a3..a47955cc28e157 100644
--- a/mlir/test/Target/LLVMIR/openmp-target-multiple-private.mlir
+++ b/mlir/test/Target/LLVMIR/openmp-target-multiple-private.mlir
@@ -2,12 +2,7 @@
 
 llvm.func @dealloc_foo_0(!llvm.ptr)
 
-omp.private {type = private} @box.heap_privatizer0 : !llvm.ptr alloc {
-^bb0(%arg0: !llvm.ptr):
-  %0 = llvm.mlir.constant(1 : i32) : i32
-  %7 = llvm.alloca %0 x !llvm.struct<(ptr, i64, i32, i8, i8, i8, i8)>  : (i32) -> !llvm.ptr
-  omp.yield(%7 : !llvm.ptr)
-} dealloc {
+omp.private {type = private} @box.heap_privatizer0 : !llvm.struct<(ptr, i64, i32, i8, i8, i8, i8)> dealloc {
 ^bb0(%arg0: !llvm.ptr):
   llvm.call @dealloc_foo_0(%arg0) : (!llvm.ptr) -> ()
   omp.yield
@@ -16,12 +11,10 @@ omp.private {type = private} @box.heap_privatizer0 : !llvm.ptr alloc {
 llvm.func @alloc_foo_1(!llvm.ptr)
 llvm.func @dealloc_foo_1(!llvm.ptr)
 
-omp.private {type = private} @box.heap_privatizer1 : !llvm.ptr alloc {
-^bb0(%arg0: !llvm.ptr):
-  %0 = llvm.mlir.constant(1 : i32) : i32
-  %7 = llvm.alloca %0 x !llvm.struct<(ptr, i64, i32, i8, i8, i8, i8)>  : (i32) -> !llvm.ptr
+omp.private {type = private} @box.heap_privatizer1 : !llvm.struct<(ptr, i64, i32, i8, i8, i8, i8)> init {
+^bb0(%arg0: !llvm.ptr, %arg1: !llvm.ptr):
   llvm.call @alloc_foo_1(%arg0) : (!llvm.ptr) -> ()
-  omp.yield(%7 : !llvm.ptr)
+  omp.yield(%arg1 : !llvm.ptr)
 } dealloc {
 ^bb0(%arg0: !llvm.ptr):
   llvm.call @dealloc_foo_1(%arg0) : (!llvm.ptr) -> ()
diff --git a/mlir/test/Target/LLVMIR/openmp-target-private-allocatable.mlir b/mlir/test/Target/LLVMIR/openmp-target-private-allocatable.mlir
index 88b4a6a63c7eb9..a25f920287fe31 100644
--- a/mlir/test/Target/LLVMIR/openmp-target-private-allocatable.mlir
+++ b/mlir/test/Target/LLVMIR/openmp-target-private-allocatable.mlir
@@ -3,12 +3,10 @@
 llvm.func @alloc_foo_1(!llvm.ptr)
 llvm.func @dealloc_foo_1(!llvm.ptr)
 
-omp.private {type = private} @box.heap_privatizer : !llvm.ptr alloc {
-^bb0(%arg0: !llvm.ptr):
-  %0 = llvm.mlir.constant(1 : i32) : i32
-  %7 = llvm.alloca %0 x !llvm.struct<(ptr, i64, i32, i8, i8, i8, i8)>  : (i32) -> !llvm.ptr
+omp.private {type = private} @box.heap_privatizer : !llvm.struct<(ptr, i64, i32, i8, i8, i8, i8)> init {
+^bb0(%arg0: !llvm.ptr, %arg1 : !llvm.ptr):
   llvm.call @alloc_foo_1(%arg0) : (!llvm.ptr) -> ()
-  omp.yield(%7 : !llvm.ptr)
+  omp.yield(%arg1 : !llvm.ptr)
 } dealloc {
 ^bb0(%arg0: !llvm.ptr):
   llvm.call @dealloc_foo_1(%arg0) : (!llvm.ptr) -> ()
diff --git a/mlir/test/Target/LLVMIR/openmp-target-private.mlir b/mlir/test/Target/LLVMIR/openmp-target-private.mlir
index c9d5f37384a0ba..f97360e2c6e848 100644
--- a/mlir/test/Target/LLVMIR/openmp-target-private.mlir
+++ b/mlir/test/Target/LLVMIR/openmp-target-private.mlir
@@ -1,11 +1,7 @@
 // RUN: mlir-translate -mlir-to-llvmir %s | FileCheck %s
 
-omp.private {type = private} @simple_var.privatizer : !llvm.ptr alloc {
-^bb0(%arg0: !llvm.ptr):
-  %0 = llvm.mlir.constant(1 : i64) : i64
-  %1 = llvm.alloca %0 x i32 {bindc_name = "simple_var", pinned} : (i64) -> !llvm.ptr
-  omp.yield(%1 : !llvm.ptr)
-}
+omp.private {type = private} @simple_var.privatizer : i32
+
 llvm.func @target_map_single_private() attributes {fir.internal_name = "_QPtarget_map_single_private"} {
   %0 = llvm.mlir.constant(1 : i64) : i64
   %1 = llvm.alloca %0 x i32 {bindc_name = "simple_var"} : (i64) -> !llvm.ptr
@@ -24,16 +20,12 @@ llvm.func @target_map_single_private() attributes {fir.internal_name = "_QPtarge
 }
 // CHECK: define internal void @__omp_offloading_
 // CHECK-NOT: define {{.*}}
-// CHECK: %[[PRIV_ALLOC:.*]] = alloca i32, i64 1, align 4
+// CHECK: %[[PRIV_ALLOC:.*]] = alloca i32, align 4
 // CHECK: %[[ADD:.*]] = add i32 {{.*}}, 10
 // CHECK: store i32 %[[ADD]], ptr %[[PRIV_ALLOC]], align 4
 
-omp.private {type = private} @n.privatizer : !llvm.ptr alloc {
-^bb0(%arg0: !llvm.ptr):
-  %0 = llvm.mlir.constant(1 : i64) : i64
-  %1 = llvm.alloca %0 x f32 {bindc_name = "n", pinned} : (i64) -> !llvm.ptr
-  omp.yield(%1 : !llvm.ptr)
-}
+omp.private {type = private} @n.privatizer : f32
+
 llvm.func @target_map_2_privates() attributes {fir.internal_name = "_QPtarget_map_2_privates"} {
   %0 = llvm.mlir.constant(1 : i64) : i64
   %1 = llvm.alloca %0 x i32 {bindc_name = "simple_var"} : (i64) -> !llvm.ptr
@@ -59,8 +51,8 @@ llvm.func @target_map_2_privates() attributes {fir.internal_name = "_QPtarget_ma
 
 
 // CHECK: define internal void @__omp_offloading_
-// CHECK: %[[PRIV_I32_ALLOC:.*]] = alloca i32, i64 1, align 4
-// CHECK: %[[PRIV_FLOAT_ALLOC:.*]] = alloca float, i64 1, align 4
+// CHECK: %[[PRIV_I32_ALLOC:.*]] = alloca i32, align 4
+// CHECK: %[[PRIV_FLOAT_ALLOC:.*]] = alloca float, align 4
 // CHECK: %[[ADD_I32:.*]] = add i32 {{.*}}, 10
 // CHECK: store i32 %[[ADD_I32]], ptr %[[PRIV_I32_ALLOC]], align 4
 // CHECK: %[[LOAD_I32_AGAIN:.*]] = load i32, ptr %[[PRIV_I32_ALLOC]], align 4
@@ -72,14 +64,12 @@ llvm.func @target_map_2_privates() attributes {fir.internal_name = "_QPtarget_ma
 // privatizers. The idea here is to prove that we set the correct
 // insertion points for the builder when generating, first, LLVM IR for the
 // privatizer and then for the actual target region.
-omp.private {type = private} @multi_block.privatizer : !llvm.ptr alloc {
-^bb0(%arg0: !llvm.ptr):
-  %c1 = llvm.mlir.constant(1 : i32) : i32
-  llvm.br ^bb1(%c1 : i32)
+omp.private {type = private} @multi_block.privatizer : f32 init {
+^bb0(%arg0: !llvm.ptr, %arg1: !llvm.ptr):
+  llvm.br ^bb1
 
-^bb1(%arg1: i32):
-  %0 = llvm.alloca %arg1 x f32 : (i32) -> !llvm.ptr
-  omp.yield(%0 : !llvm.ptr)
+^bb1:
+  omp.yield(%arg1 : !llvm.ptr)
 }
 
 llvm.func @target_op_private_multi_block(%arg0: !llvm.ptr) {
@@ -90,8 +80,7 @@ llvm.func @target_op_private_multi_block(%arg0: !llvm.ptr) {
   llvm.return
 }
 // CHECK: define internal void @__omp_offloading_
-// CHECK: %[[ONE:.*]] = phi i32 [ 1, {{.*}} ]
-// CHECK: %[[PRIV_ALLOC:.*]] = alloca float, i32 %[[ONE]], align 4
+// CHECK: %[[PRIV_ALLOC:.*]] = alloca float, align 4
 // CHECK: %[[PHI_ALLOCA:.*]]  = phi ptr [ %[[PRIV_ALLOC]], {{.*}} ]
 // CHECK: %[[RESULT:.*]] = load float, ptr %[[PHI_ALLOCA]], align 4
 
@@ -105,8 +94,8 @@ llvm.func @target_op_private_multi_block(%arg0: !llvm.ptr) {
 // mapped by its pointer whereas the privatizer function expects the descriptor
 // by value. So, we have this test to ensure that the compiler correctly loads
 // from the mapped pointer before passing that to the privatizer function.
-omp.private {type = private} @_QFtarget_boxcharEchar_var_private_boxchar_c8xU : !llvm.struct<(ptr, i64)> alloc {
-^bb0(%arg0: !llvm.struct<(ptr, i64)>):
+omp.private {type = private} @_QFtarget_boxcharEchar_var_private_boxchar_c8xU : !llvm.struct<(ptr, i64)> init {
+^bb0(%arg0: !llvm.struct<(ptr, i64)>, %arg1: !llvm.struct<(ptr, i64)>):
   %0 = llvm.extractvalue %arg0[0] : !llvm.struct<(ptr, i64)>
   %1 = llvm.extractvalue %arg0[1] : !llvm.struct<(ptr, i64)>
   %2 = llvm.mlir.constant(1 : i64) : i64
diff --git a/mlir/test/Target/LLVMIR/openmp-target-simd-on_device.mlir b/mlir/test/Target/LLVMIR/openmp-target-simd-on_device.mlir
index 0ce90578ea9d62..5c971206731e46 100644
--- a/mlir/test/Target/LLVMIR/openmp-target-simd-on_device.mlir
+++ b/mlir/test/Target/LLVMIR/openmp-target-simd-on_device.mlir
@@ -1,8 +1,8 @@
 // RUN: mlir-translate -mlir-to-llvmir %s | FileCheck %s
 
 module attributes {omp.is_target_device = true} {
-  omp.private {type = private} @simd_privatizer : !llvm.ptr alloc {
-  ^bb0(%arg0: !llvm.ptr):
+  omp.private {type = private} @simd_privatizer : !llvm.ptr init {
+  ^bb0(%arg0: !llvm.ptr, %arg1: !llvm.ptr):
     omp.yield(%arg0 : !llvm.ptr)
   }
 
diff --git a/mlir/test/Target/LLVMIR/openmp-todo.mlir b/mlir/test/Target/LLVMIR/openmp-todo.mlir
index bb2a74841e9afb..2fc003c8f76fac 100644
--- a/mlir/test/Target/LLVMIR/openmp-todo.mlir
+++ b/mlir/test/Target/LLVMIR/openmp-todo.mlir
@@ -112,11 +112,11 @@ llvm.func @sections_allocate(%x : !llvm.ptr) {
 
 // -----
 
-omp.private {type = private} @x.privatizer : !llvm.ptr alloc {
-^bb0(%arg0: !llvm.ptr):
-  %0 = llvm.mlir.constant(1 : i32) : i32
-  %1 = llvm.alloca %0 x i32 : (i32) -> !llvm.ptr
-  omp.yield(%1 : !llvm.ptr)
+omp.private {type = private} @x.privatizer : i32 init {
+^bb0(%mold: !llvm.ptr, %private: !llvm.ptr):
+  %c0 = llvm.mlir.constant(0 : i32) : i32
+  llvm.store %c0, %private : i32, !llvm.ptr
+  omp.yield(%private: !llvm.ptr)
 }
 llvm.func @sections_private(%x : !llvm.ptr) {
   // expected-error at below {{not yet implemented: Unhandled clause privatization in omp.sections operation}}
@@ -197,11 +197,11 @@ llvm.func @single_allocate(%x : !llvm.ptr) {
 
 // -----
 
-omp.private {type = private} @x.privatizer : !llvm.ptr alloc {
-^bb0(%arg0: !llvm.ptr):
-  %0 = llvm.mlir.constant(1 : i32) : i32
-  %1 = llvm.alloca %0 x i32 : (i32) -> !llvm.ptr
-  omp.yield(%1 : !llvm.ptr)
+omp.private {type = private} @x.privatizer : i32 init {
+^bb0(%mold: !llvm.ptr, %private: !llvm.ptr):
+  %c0 = llvm.mlir.constant(0 : i32) : i32
+  llvm.store %c0, %private : i32, !llvm.ptr
+  omp.yield(%private: !llvm.ptr)
 }
 llvm.func @single_private(%x : !llvm.ptr) {
   // expected-error at below {{not yet implemented: Unhandled clause privatization in omp.single operation}}
@@ -310,12 +310,11 @@ llvm.func @target_is_device_ptr(%x : !llvm.ptr) {
 
 // -----
 
-omp.private {type = firstprivate} @x.privatizer : !llvm.ptr alloc {
-^bb0(%arg0: !llvm.ptr):
-  omp.yield(%arg0 : !llvm.ptr)
-} copy {
-^bb0(%arg0: !llvm.ptr, %arg1: !llvm.ptr):
-  omp.yield(%arg0 : !llvm.ptr)
+omp.private {type = firstprivate} @x.privatizer : i32 copy {
+^bb0(%mold: !llvm.ptr, %private: !llvm.ptr):
+  %0 = llvm.load %mold : !llvm.ptr -> i32
+  llvm.store %0, %private : i32, !llvm.ptr
+  omp.yield(%private: !llvm.ptr)
 }
 llvm.func @target_firstprivate(%x : !llvm.ptr) {
   // expected-error at below {{not yet implemented: Unhandled clause firstprivate in omp.target operation}}
@@ -498,11 +497,11 @@ llvm.func @teams_allocate(%x : !llvm.ptr) {
 
 // -----
 
-omp.private {type = private} @x.privatizer : !llvm.ptr alloc {
-^bb0(%arg0: !llvm.ptr):
-  %0 = llvm.mlir.constant(1 : i32) : i32
-  %1 = llvm.alloca %0 x i32 : (i32) -> !llvm.ptr
-  omp.yield(%1 : !llvm.ptr)
+omp.private {type = private} @x.privatizer : i32 init {
+^bb0(%mold: !llvm.ptr, %private: !llvm.ptr):
+  %c0 = llvm.mlir.constant(0 : i32) : i32
+  llvm.store %c0, %private : i32, !llvm.ptr
+  omp.yield(%private: !llvm.ptr)
 }
 llvm.func @teams_private(%x : !llvm.ptr) {
   // expected-error at below {{not yet implemented: Unhandled clause privatization in omp.teams operation}}
diff --git a/mlir/test/Target/LLVMIR/openmp-wsloop-private-cond_br.mlir b/mlir/test/Target/LLVMIR/openmp-wsloop-private-cond_br.mlir
index 4393fafb62efa2..33737c4368a18b 100644
--- a/mlir/test/Target/LLVMIR/openmp-wsloop-private-cond_br.mlir
+++ b/mlir/test/Target/LLVMIR/openmp-wsloop-private-cond_br.mlir
@@ -3,12 +3,7 @@
 // tests firx for test-suite test: pr69183.f90. Makes sure we can handle inling
 // private ops when the alloca block ends with a conditional branch.
 
-omp.private {type = private} @_QFwsloop_privateEi_private_ref_i32 : !llvm.ptr alloc {
-^bb0(%arg0: !llvm.ptr):
-  %0 = llvm.mlir.constant(1 : i64) : i64
-  %1 = llvm.alloca %0 x i32 {bindc_name = "i", pinned} : (i64) -> !llvm.ptr
-  omp.yield(%1 : !llvm.ptr)
-}
+omp.private {type = private} @_QFwsloop_privateEi_private_ref_i32 : i64
 
 llvm.func @wsloop_private_(%arg0: !llvm.ptr {fir.bindc_name = "y"}) attributes {fir.internal_name = "_QPwsloop_private", frame_pointer = #llvm.framePointerKind<all>, target_cpu = "x86-64"} {
   %0 = llvm.mlir.constant(1 : i64) : i64
@@ -35,13 +30,13 @@ llvm.func @wsloop_private_(%arg0: !llvm.ptr {fir.bindc_name = "y"}) attributes {
 }
 
 // CHECK:   %[[INT:.*]] = alloca i32, i64 1, align 4
-// CHECK:   br label %[[LATE_ALLOC_BB:.*]]
+// CHECK:   br label %[[OMP_PRIVATE_INIT:.*]]
 
-// CHECK: [[LATE_ALLOC_BB]]:
+// CHECK: [[OMP_PRIVATE_INIT]]:
 // CHECK:   br label %[[AFTER_ALLOC_BB:.*]]
 
 // CHECK: [[AFTER_ALLOC_BB]]:
-// CHECK:   br i1 false, label %[[BB1:.*]], label %5[[BB2:.*]]
+// CHECK:   br i1 false, label %[[BB1:.*]], label %[[BB2:.*]]
 
 // CHECK: [[BB1]]:
 // CHECK:   br label %[[BB3:.*]]
diff --git a/mlir/test/Target/LLVMIR/openmp-wsloop-private.mlir b/mlir/test/Target/LLVMIR/openmp-wsloop-private.mlir
index db67bb5fae58b0..23a0ae5713aa2e 100644
--- a/mlir/test/Target/LLVMIR/openmp-wsloop-private.mlir
+++ b/mlir/test/Target/LLVMIR/openmp-wsloop-private.mlir
@@ -3,21 +3,11 @@
 // tests a wsloop private + firstprivate + reduction to make sure block structure
 // is handled properly.
 
-omp.private {type = private} @_QFwsloop_privateEi_private_ref_i32 : !llvm.ptr alloc {
-^bb0(%arg0: !llvm.ptr):
-  %0 = llvm.mlir.constant(1 : i64) : i64
-  %1 = llvm.alloca %0 x i32 {bindc_name = "i", pinned} : (i64) -> !llvm.ptr
-  omp.yield(%1 : !llvm.ptr)
-}
+omp.private {type = private} @_QFwsloop_privateEi_private_ref_i32 : i32
 
 llvm.func @foo_free(!llvm.ptr)
 
-omp.private {type = firstprivate} @_QFwsloop_privateEc_firstprivate_ref_c8 : !llvm.ptr alloc {
-^bb0(%arg0: !llvm.ptr):
-  %0 = llvm.mlir.constant(1 : i64) : i64
-  %1 = llvm.alloca %0 x !llvm.array<1 x i8> {bindc_name = "c", pinned} : (i64) -> !llvm.ptr
-  omp.yield(%1 : !llvm.ptr)
-} copy {
+omp.private {type = firstprivate} @_QFwsloop_privateEc_firstprivate_ref_c8 : !llvm.array<1 x i8> copy {
 ^bb0(%arg0: !llvm.ptr, %arg1: !llvm.ptr):
   %0 = llvm.load %arg0 : !llvm.ptr -> !llvm.array<1 x i8>
   llvm.store %0, %arg1 : !llvm.array<1 x i8>, !llvm.ptr
@@ -63,8 +53,8 @@ llvm.func @wsloop_private_(%arg0: !llvm.ptr {fir.bindc_name = "y"}) attributes {
 
 // First, check that all memory for privates and reductions is allocated.
 // CHECK: omp.par.entry:
-// CHECK:   %[[CHR:.*]] = alloca [1 x i8], i64 1, align 1
-// CHECK:   %[[INT:.*]] = alloca i32, i64 1, align 4
+// CHECK:   %[[CHR:.*]] = alloca [1 x i8], align 1
+// CHECK:   %[[INT:.*]] = alloca i32, align 4
 // CHECK:   %[[FLT:.*]] = alloca float, align 4
 // CHECK:   %[[RED_ARR:.*]] = alloca [1 x ptr], align 8
 // CHECK:   br label %[[LATE_ALLOC_BB:.*]]

>From ca55879fe79351ae9bedc45f5438ae34a94e2d2e Mon Sep 17 00:00:00 2001
From: Tom Eccles <tom.eccles at arm.com>
Date: Wed, 11 Dec 2024 18:10:24 +0000
Subject: [PATCH 02/12] Fix bug when used with lastprivate

Arrays are boxed and those boxed arrays are passed by reference. This
can cause problems if we try to perform eager privatization (e.g.
lastprivate) on a variable already privatized using delayed
privatization. Work around this by loading the reference to the box.
---
 .../lib/Lower/OpenMP/DataSharingProcessor.cpp | 57 ++++++++++++++++++-
 ...d-privatization-lastprivate-of-private.f90 | 22 +++++++
 2 files changed, 77 insertions(+), 2 deletions(-)
 create mode 100644 flang/test/Lower/OpenMP/delayed-privatization-lastprivate-of-private.f90

diff --git a/flang/lib/Lower/OpenMP/DataSharingProcessor.cpp b/flang/lib/Lower/OpenMP/DataSharingProcessor.cpp
index 44ec6b798c7c0d..12c1ed36496790 100644
--- a/flang/lib/Lower/OpenMP/DataSharingProcessor.cpp
+++ b/flang/lib/Lower/OpenMP/DataSharingProcessor.cpp
@@ -17,6 +17,7 @@
 #include "flang/Lower/ConvertVariable.h"
 #include "flang/Lower/PFTBuilder.h"
 #include "flang/Lower/SymbolMap.h"
+#include "flang/Optimizer/Builder/BoxValue.h"
 #include "flang/Optimizer/Builder/HLFIRTools.h"
 #include "flang/Optimizer/Builder/Todo.h"
 #include "flang/Optimizer/HLFIR/HLFIROps.h"
@@ -94,8 +95,60 @@ void DataSharingProcessor::insertDeallocs() {
 
 void DataSharingProcessor::cloneSymbol(const semantics::Symbol *sym) {
   bool isFirstPrivate = sym->test(semantics::Symbol::Flag::OmpFirstPrivate);
-  bool success = converter.createHostAssociateVarClone(
-      *sym, /*skipDefaultInit=*/isFirstPrivate);
+
+  // If we are doing eager-privatization on a symbol created using delayed
+  // privatization there could be incompatible types here e.g.
+  // fir.ref<fir.box<fir.array<>>>
+  bool success = false;
+  [&]() {
+    const auto *details =
+        sym->detailsIf<Fortran::semantics::HostAssocDetails>();
+    assert(details && "No host-association found");
+    const Fortran::semantics::Symbol &hsym = details->symbol();
+    mlir::Value addr = converter.getSymbolAddress(hsym);
+
+    if (auto refTy = mlir::dyn_cast<fir::ReferenceType>(addr.getType())) {
+      if (auto boxTy = mlir::dyn_cast<fir::BoxType>(refTy.getElementType())) {
+        if (auto arrayTy =
+                mlir::dyn_cast<fir::SequenceType>(boxTy.getElementType())) {
+          // FirConverter/fir::ExtendedValue considers all references to boxes
+          // as mutable boxes. Outside of OpenMP it doesn't make sense to have a
+          // mutable box of an array. Work around this here by loading the
+          // reference so it is a normal boxed array.
+          fir::FirOpBuilder &builder = converter.getFirOpBuilder();
+          mlir::Location loc = converter.genLocation(hsym.name());
+          fir::ExtendedValue hexv = converter.getSymbolExtendedValue(hsym);
+
+          llvm::SmallVector<mlir::Value> extents =
+              fir::factory::getExtents(loc, builder, hexv);
+
+          // TODO: uniqName, name
+          mlir::Value allocVal =
+              builder.allocateLocal(loc, arrayTy, /*uniqName=*/"",
+                                    /*name=*/"", extents, /*typeParams=*/{},
+                                    sym->GetUltimate().attrs().test(
+                                        Fortran::semantics::Attr::TARGET));
+          mlir::Value shape = builder.genShape(loc, extents);
+          mlir::Value box = builder.createBox(loc, boxTy, allocVal, shape,
+                                              nullptr, {}, nullptr);
+
+          // This can't be a CharArrayBoxValue because otherwise
+          // boxTy.getElementType() would be a charcater type.
+          // Assume the array element type isn't polymorphic because we are
+          // privatizing.
+          fir::ExtendedValue newExv = fir::ArrayBoxValue{box, extents};
+
+          converter.bindSymbol(*sym, newExv);
+          success = true;
+          return;
+        }
+      }
+    }
+
+    // Normal case:
+    success = converter.createHostAssociateVarClone(
+        *sym, /*skipDefaultInit=*/isFirstPrivate);
+  }();
   (void)success;
   assert(success && "Privatization failed due to existing binding");
 
diff --git a/flang/test/Lower/OpenMP/delayed-privatization-lastprivate-of-private.f90 b/flang/test/Lower/OpenMP/delayed-privatization-lastprivate-of-private.f90
new file mode 100644
index 00000000000000..be075825c5bd6a
--- /dev/null
+++ b/flang/test/Lower/OpenMP/delayed-privatization-lastprivate-of-private.f90
@@ -0,0 +1,22 @@
+! RUN: %flang_fc1 -emit-hlfir -fopenmp -o - %s | FileCheck %s
+! RUN: bbc -emit-hlfir -fopenmp -o - %s | FileCheck %s
+
+! Check that we can lower this without crashing
+
+! CHECK: func.func @_QPlastprivate_of_private
+subroutine lastprivate_of_private(a)
+  real :: a(100)
+  integer i
+  ! CHECK: omp.parallel private({{.*}}) {
+  !$omp parallel private(a)
+    ! CHECK: omp.parallel {
+    !$omp parallel shared(a)
+    ! CHECK: omp.wsloop {
+    !$omp do lastprivate(a)
+    ! CHECK: omp.loop_nest
+      do i=1,100
+        a(i) = 1.0
+      end do
+    !$omp end parallel
+  !$omp end parallel
+end subroutine

>From ee3539dddf29f811f41baf606829dea66d7486cb Mon Sep 17 00:00:00 2001
From: Tom Eccles <tom.eccles at arm.com>
Date: Tue, 10 Dec 2024 09:25:22 +0000
Subject: [PATCH 03/12] Support char types

The way boxchars are handled isn't ideal, but !fir.ref<!fir.boxchar<>>
seems to violate a lot of assumptions in the wider ecosystem of (hl)fir
helpers making it difficult to generate a copy region. I suspect
!fir.ref<!fir.boxchar<>> is not supposed to work (it looks like a
mutable character box, which isn't possible because a boxchar should be
an SSA value). Fixing this would be a big change beyond the scope of
this already too large PR.
---
 .../lib/Lower/OpenMP/DataSharingProcessor.cpp |  7 +--
 .../Lower/OpenMP/PrivateReductionUtils.cpp    | 62 +++++++++++++++++--
 .../delayed-privatization-character.f90       | 17 ++---
 .../OpenMP/parallel-private-clause-fixes.f90  | 35 ++++-------
 .../OpenMP/parallel-private-clause-str.f90    | 31 ++++------
 .../test/Lower/OpenMP/private-commonblock.f90 | 23 ++++---
 6 files changed, 105 insertions(+), 70 deletions(-)

diff --git a/flang/lib/Lower/OpenMP/DataSharingProcessor.cpp b/flang/lib/Lower/OpenMP/DataSharingProcessor.cpp
index 12c1ed36496790..25afcb0b325e3d 100644
--- a/flang/lib/Lower/OpenMP/DataSharingProcessor.cpp
+++ b/flang/lib/Lower/OpenMP/DataSharingProcessor.cpp
@@ -563,15 +563,14 @@ void DataSharingProcessor::doPrivatize(const semantics::Symbol *sym,
 
     // Populate the `init` region.
     const bool needsInitialization =
-        !fir::isa_trivial(allocType) ||
-        Fortran::lower::hasDefaultInitialization(sym->GetUltimate();
+        Fortran::lower::hasDefaultInitialization(sym->GetUltimate()) ||
+        mlir::isa<fir::BaseBoxType>(allocType) ||
+        mlir::isa<fir::BoxCharType>(allocType);
     if (needsInitialization) {
       mlir::Region &initRegion = result.getInitRegion();
       mlir::Block *initBlock = firOpBuilder.createBlock(
           &initRegion, /*insertPt=*/{}, {argType, argType}, {symLoc, symLoc});
 
-      if (fir::isa_char(allocType))
-        TODO(symLoc, "Privatization init of characters");
       if (fir::isa_derived(allocType))
         TODO(symLoc, "Privatization init of derived types");
       if (Fortran::lower::hasDefaultInitialization(sym->GetUltimate()))
diff --git a/flang/lib/Lower/OpenMP/PrivateReductionUtils.cpp b/flang/lib/Lower/OpenMP/PrivateReductionUtils.cpp
index 83f0d4e93ca548..0835030dbbeb43 100644
--- a/flang/lib/Lower/OpenMP/PrivateReductionUtils.cpp
+++ b/flang/lib/Lower/OpenMP/PrivateReductionUtils.cpp
@@ -12,9 +12,13 @@
 
 #include "PrivateReductionUtils.h"
 
+#include "flang/Optimizer/Builder/BoxValue.h"
+#include "flang/Optimizer/Builder/Character.h"
 #include "flang/Optimizer/Builder/FIRBuilder.h"
 #include "flang/Optimizer/Builder/HLFIRTools.h"
 #include "flang/Optimizer/Builder/Todo.h"
+#include "flang/Optimizer/Dialect/FIROps.h"
+#include "flang/Optimizer/Dialect/FIRType.h"
 #include "flang/Optimizer/HLFIR/HLFIROps.h"
 #include "flang/Optimizer/Support/FatalError.h"
 #include "mlir/Dialect/OpenMP/OpenMPDialect.h"
@@ -66,6 +70,21 @@ static void createCleanupRegion(fir::FirOpBuilder &builder, mlir::Location loc,
     return;
   }
 
+  if (auto boxCharTy = mlir::dyn_cast<fir::BoxCharType>(argType)) {
+    auto [addr, len] =
+        fir::factory::CharacterExprHelper{builder, loc}.createUnboxChar(
+            block->getArgument(0));
+
+    // convert addr to a heap type so it can be used with fir::FreeMemOp
+    auto refTy = mlir::cast<fir::ReferenceType>(addr.getType());
+    auto heapTy = fir::HeapType::get(refTy.getEleTy());
+    addr = builder.createConvert(loc, heapTy, addr);
+
+    builder.create<fir::FreeMemOp>(loc, addr);
+    builder.create<mlir::omp::YieldOp>(loc);
+    return;
+  }
+
   typeError();
 }
 
@@ -129,6 +148,8 @@ void Fortran::lower::omp::populateByRefInitAndCleanupRegions(
   // }
   // omp.yield %box_alloca
   moldArg = builder.loadIfRef(loc, moldArg);
+  mlir::SmallVector<mlir::Value> lenParams;
+  hlfir::genLengthParameters(loc, builder, hlfir::Entity{moldArg}, lenParams);
   auto handleNullAllocatable = [&](mlir::Value boxAlloca) -> fir::IfOp {
     mlir::Value addr = builder.create<fir::BoxAddrOp>(loc, moldArg);
     mlir::Value isNotAllocated = builder.genIsNullAddr(loc, addr);
@@ -136,7 +157,9 @@ void Fortran::lower::omp::populateByRefInitAndCleanupRegions(
                                                /*withElseRegion=*/true);
     builder.setInsertionPointToStart(&ifOp.getThenRegion().front());
     // just embox the null address and return
-    mlir::Value nullBox = builder.create<fir::EmboxOp>(loc, ty, addr);
+    mlir::Value nullBox =
+        builder.create<fir::EmboxOp>(loc, ty, addr, /*shape=*/mlir::Value{},
+                                     /*slice=*/mlir::Value{}, lenParams);
     builder.create<fir::StoreOp>(loc, nullBox, boxAlloca);
     return ifOp;
   };
@@ -149,7 +172,8 @@ void Fortran::lower::omp::populateByRefInitAndCleanupRegions(
     builder.setInsertionPointToEnd(initBlock);
     mlir::Value boxAlloca = allocatedPrivVarArg;
     mlir::Type innerTy = fir::unwrapRefType(boxTy.getEleTy());
-    if (fir::isa_trivial(innerTy)) {
+    bool isChar = fir::isa_char(innerTy);
+    if (fir::isa_trivial(innerTy) || isChar) {
       // boxed non-sequence value e.g. !fir.box<!fir.heap<i32>>
       if (!isAllocatableOrPointer)
         TODO(loc,
@@ -158,10 +182,13 @@ void Fortran::lower::omp::populateByRefInitAndCleanupRegions(
       fir::IfOp ifUnallocated = handleNullAllocatable(boxAlloca);
 
       builder.setInsertionPointToStart(&ifUnallocated.getElseRegion().front());
-      mlir::Value valAlloc = builder.create<fir::AllocMemOp>(loc, innerTy);
+      mlir::Value valAlloc = builder.createHeapTemporary(
+          loc, innerTy, /*name=*/{}, /*shape=*/{}, lenParams);
       if (scalarInitValue)
         builder.createStoreWithConvert(loc, scalarInitValue, valAlloc);
-      mlir::Value box = builder.create<fir::EmboxOp>(loc, ty, valAlloc);
+      mlir::Value box = builder.create<fir::EmboxOp>(
+          loc, ty, valAlloc, /*shape=*/mlir::Value{}, /*slice=*/mlir::Value{},
+          lenParams);
       builder.create<fir::StoreOp>(loc, box, boxAlloca);
 
       createCleanupRegion(builder, loc, argType, cleanupRegion);
@@ -170,7 +197,7 @@ void Fortran::lower::omp::populateByRefInitAndCleanupRegions(
       return;
     }
     innerTy = fir::extractSequenceType(boxTy);
-    if (!mlir::isa<fir::SequenceType>(innerTy))
+    if (!innerTy || !mlir::isa<fir::SequenceType>(innerTy))
       TODO(loc, "Unsupported boxed type for reduction/privatization");
 
     fir::IfOp ifUnallocated{nullptr};
@@ -230,6 +257,31 @@ void Fortran::lower::omp::populateByRefInitAndCleanupRegions(
     return;
   }
 
+  if (auto boxCharTy = mlir::dyn_cast<fir::BoxCharType>(argType)) {
+    mlir::Type eleTy = boxCharTy.getEleTy();
+    builder.setInsertionPointToStart(initBlock);
+    fir::factory::CharacterExprHelper charExprHelper{builder, loc};
+    auto [addr, len] = charExprHelper.createUnboxChar(moldArg);
+
+    // Using heap temporary so that
+    // 1) It is safe to use privatization inside of big loops.
+    // 2) The lifetime can outlive the current stack frame for delayed task
+    // execution.
+    // We can't always allocate a boxchar implicitly as the type of the
+    // omp.private because the allocation potentially needs the length
+    // parameters fetched above.
+    // TODO: this deviates from the intended design for delayed task execution.
+    mlir::Value privateAddr = builder.createHeapTemporary(
+        loc, eleTy, /*name=*/{}, /*shape=*/{}, /*lenParams=*/len);
+    mlir::Value boxChar = charExprHelper.createEmboxChar(privateAddr, len);
+
+    createCleanupRegion(builder, loc, argType, cleanupRegion);
+
+    builder.setInsertionPointToEnd(initBlock);
+    yield(boxChar);
+    return;
+  }
+
   TODO(loc,
        "creating reduction/privatization init region for unsupported type");
   return;
diff --git a/flang/test/Lower/OpenMP/delayed-privatization-character.f90 b/flang/test/Lower/OpenMP/delayed-privatization-character.f90
index db678ab13bbe69..3d1a3129633719 100644
--- a/flang/test/Lower/OpenMP/delayed-privatization-character.f90
+++ b/flang/test/Lower/OpenMP/delayed-privatization-character.f90
@@ -24,13 +24,13 @@ subroutine delayed_privatization_character(var1, l)
 end subroutine
 
 ! DYN_LEN-LABEL: omp.private {type = firstprivate}
-! DYN_LEN-SAME: @[[PRIVATIZER_SYM:.*]] : [[TYPE:!fir.boxchar<1>]] alloc {
+! DYN_LEN-SAME: @[[PRIVATIZER_SYM:.*]] : [[TYPE:!fir.boxchar<1>]] init {
 
-! DYN_LEN-NEXT: ^bb0(%[[PRIV_ARG:.*]]: [[TYPE]]):
+! DYN_LEN-NEXT: ^bb0(%[[PRIV_ARG:.*]]: [[TYPE]], %[[ALLOC_ARG:.*]]: [[TYPE]]):
 ! DYN_LEN-NEXT:   %[[UNBOX:.*]]:2 = fir.unboxchar %[[PRIV_ARG]]
-! DYN_LEN:        %[[PRIV_ALLOC:.*]] = fir.alloca !fir.char<1,?>(%[[UNBOX]]#1 : index)
-! DYN_LEN-NEXT:   %[[PRIV_DECL:.*]]:2 = hlfir.declare %[[PRIV_ALLOC]] typeparams %[[UNBOX]]#1
-! DYN_LEN-NEXT:   omp.yield(%[[PRIV_DECL]]#0 : !fir.boxchar<1>)
+! DYN_LEN-NEXT:   %[[PRIV_ALLOC:.*]] = fir.allocmem !fir.char<1,?>(%[[UNBOX]]#1 : index)
+! DYN_LEN-NEXT:   %[[EMBOXCHAR:.*]] = fir.emboxchar %[[PRIV_ALLOC]], %[[UNBOX]]#1
+! DYN_LEN:        omp.yield(%[[EMBOXCHAR]] : !fir.boxchar<1>)
 
 ! DYN_LEN-NEXT: } copy {
 ! DYN_LEN-NEXT: ^bb0(%[[PRIV_ORIG_ARG:.*]]: [[TYPE]], %[[PRIV_PRIV_ARG:.*]]: [[TYPE]]):
@@ -51,9 +51,4 @@ subroutine delayed_privatization_character_static_len(var1)
 end subroutine
 
 ! STATIC_LEN-LABEL: omp.private {type = private}
-! STATIC_LEN-SAME: @[[PRIVATIZER_SYM:.*]] : [[TYPE:!fir.ref<!fir.char<1,10>>]] alloc {
-
-! STATIC_LEN-NEXT: ^bb0(%[[PRIV_ARG:.*]]: [[TYPE]]):
-! STATIC_LEN-NEXT:   %[[C10:.*]] = arith.constant 10 : index
-! STATIC_LEN-NEXT:   %[[PRIV_ALLOC:.*]] = fir.alloca !fir.char<1,10>
-! STATIC_LEN-NEXT:   %[[PRIV_DECL:.*]]:2 = hlfir.declare %[[PRIV_ALLOC]] typeparams %[[C10]]
+! STATIC_LEN-SAME: @[[PRIVATIZER_SYM:.*]] : [[TYPE:!fir.char<1,10>]]
diff --git a/flang/test/Lower/OpenMP/parallel-private-clause-fixes.f90 b/flang/test/Lower/OpenMP/parallel-private-clause-fixes.f90
index 99323e69113bcc..2c1b4d9e5d77f4 100644
--- a/flang/test/Lower/OpenMP/parallel-private-clause-fixes.f90
+++ b/flang/test/Lower/OpenMP/parallel-private-clause-fixes.f90
@@ -3,30 +3,23 @@
 ! RUN: bbc -fopenmp -emit-hlfir %s -o - \
 ! RUN: | FileCheck %s
 
-! CHECK:  omp.private {type = private} @[[BOX_HEAP_CHAR_PRIVATIZER:_QFsub01Eaaa_private_ref_box_heap_c8xU]] : !fir.ref<!fir.box<!fir.heap<!fir.char<1,?>>>> alloc {
-! CHECK:  ^bb0(%[[ORIG_REF:.*]]: !fir.ref<!fir.box<!fir.heap<!fir.char<1,?>>>>):
-! CHECK:             %[[VAL_4:.*]] = fir.alloca !fir.box<!fir.heap<!fir.char<1,?>>> {bindc_name = "aaa", pinned, uniq_name = "_QFsub01Eaaa"}
+! CHECK:  omp.private {type = private} @[[BOX_HEAP_CHAR_PRIVATIZER:_QFsub01Eaaa_private_box_heap_c8xU]] : !fir.box<!fir.heap<!fir.char<1,?>>> init {
+! CHECK:  ^bb0(%[[ORIG_REF:.*]]: !fir.ref<!fir.box<!fir.heap<!fir.char<1,?>>>>, %[[VAL_4:.*]]: !fir.ref<!fir.box<!fir.heap<!fir.char<1,?>>>>):
 ! CHECK:             %[[VAL_5:.*]] = fir.load %[[ORIG_REF]] : !fir.ref<!fir.box<!fir.heap<!fir.char<1,?>>>>
+! CHECK:             %[[ELESIZE:.*]] = fir.box_elesize %[[VAL_5]]
 ! CHECK:             %[[VAL_6:.*]] = fir.box_addr %[[VAL_5]] : (!fir.box<!fir.heap<!fir.char<1,?>>>) -> !fir.heap<!fir.char<1,?>>
 ! CHECK:             %[[VAL_7:.*]] = fir.convert %[[VAL_6]] : (!fir.heap<!fir.char<1,?>>) -> i64
 ! CHECK:             %[[VAL_8:.*]] = arith.constant 0 : i64
-! CHECK:             %[[VAL_9:.*]] = arith.cmpi ne, %[[VAL_7]], %[[VAL_8]] : i64
+! CHECK:             %[[VAL_9:.*]] = arith.cmpi eq, %[[VAL_7]], %[[VAL_8]] : i64
 ! CHECK:             fir.if %[[VAL_9]] {
-! CHECK:               %[[ELEM_SIZE:.*]] = fir.box_elesize %{{.*}} : (!fir.box<!fir.heap<!fir.char<1,?>>>) -> index
-! CHECK:               %[[VAL_10:.*]] = arith.constant 0 : index
-! CHECK:               %[[VAL_11:.*]] = arith.cmpi sgt, %[[ELEM_SIZE]], %[[VAL_10]] : index
-! CHECK:               %[[VAL_12:.*]] = arith.select %[[VAL_11]], %[[ELEM_SIZE]], %[[VAL_10]] : index
-! CHECK:               %[[VAL_13:.*]] = fir.allocmem !fir.char<1,?>(%[[VAL_12]] : index) {fir.must_be_heap = true, uniq_name = "_QFsub01Eaaa.alloc"}
-! CHECK:               %[[VAL_14:.*]] = fir.embox %[[VAL_13]] typeparams %[[VAL_12]] : (!fir.heap<!fir.char<1,?>>, index) -> !fir.box<!fir.heap<!fir.char<1,?>>>
-! CHECK:               fir.store %[[VAL_14]] to %[[VAL_4]] : !fir.ref<!fir.box<!fir.heap<!fir.char<1,?>>>>
-! CHECK:             } else {
-! CHECK:               %[[VAL_15:.*]] = fir.zero_bits !fir.heap<!fir.char<1,?>>
-! CHECK:               %[[VAL_16:.*]] = arith.constant 0 : index
-! CHECK:               %[[VAL_17:.*]] = fir.embox %[[VAL_15]] typeparams %[[VAL_16]] : (!fir.heap<!fir.char<1,?>>, index) -> !fir.box<!fir.heap<!fir.char<1,?>>>
+! CHECK:               %[[VAL_17:.*]] = fir.embox %[[VAL_6]] typeparams %[[ELESIZE]] : (!fir.heap<!fir.char<1,?>>, index) -> !fir.box<!fir.heap<!fir.char<1,?>>>
 ! CHECK:               fir.store %[[VAL_17]] to %[[VAL_4]] : !fir.ref<!fir.box<!fir.heap<!fir.char<1,?>>>>
+! CHECK:             } else {
+! CHECK:               %[[VAL_13:.*]] = fir.allocmem !fir.char<1,?>(%[[ELESIZE]] : index)
+! CHECK:               %[[VAL_14:.*]] = fir.embox %[[VAL_13]] typeparams %[[ELESIZE]] : (!fir.heap<!fir.char<1,?>>, index) -> !fir.box<!fir.heap<!fir.char<1,?>>>
+! CHECK:               fir.store %[[VAL_14]] to %[[VAL_4]] : !fir.ref<!fir.box<!fir.heap<!fir.char<1,?>>>>
 ! CHECK:             }
-! CHECK:             %[[VAL_18:.*]]:2 = hlfir.declare %[[VAL_4]] {fortran_attrs = #{{.*}}<allocatable>, uniq_name = "_QFsub01Eaaa"} : (!fir.ref<!fir.box<!fir.heap<!fir.char<1,?>>>>) -> (!fir.ref<!fir.box<!fir.heap<!fir.char<1,?>>>>, !fir.ref<!fir.box<!fir.heap<!fir.char<1,?>>>>)
-!CHECK:              omp.yield(%[[VAL_18]]#0 : !fir.ref<!fir.box<!fir.heap<!fir.char<1,?>>>>)
+!CHECK:              omp.yield(%[[VAL_4]] : !fir.ref<!fir.box<!fir.heap<!fir.char<1,?>>>>)
 !CHECK:  } dealloc {
 !CHECK:  ^bb0(%[[ORIG_REF:.*]]: !fir.ref<!fir.box<!fir.heap<!fir.char<1,?>>>>):
 ! CHECK:             %[[VAL_19:.*]] = fir.load %[[ORIG_REF]] : !fir.ref<!fir.box<!fir.heap<!fir.char<1,?>>>>
@@ -35,13 +28,7 @@
 ! CHECK:             %[[VAL_22:.*]] = arith.constant 0 : i64
 ! CHECK:             %[[VAL_23:.*]] = arith.cmpi ne, %[[VAL_21]], %[[VAL_22]] : i64
 ! CHECK:             fir.if %[[VAL_23]] {
-! CHECK:               %[[VAL_24:.*]] = fir.load %[[ORIG_REF]] : !fir.ref<!fir.box<!fir.heap<!fir.char<1,?>>>>
-! CHECK:               %[[VAL_25:.*]] = fir.box_addr %[[VAL_24]] : (!fir.box<!fir.heap<!fir.char<1,?>>>) -> !fir.heap<!fir.char<1,?>>
-! CHECK:               fir.freemem %[[VAL_25]] : !fir.heap<!fir.char<1,?>>
-! CHECK:               %[[VAL_26:.*]] = fir.zero_bits !fir.heap<!fir.char<1,?>>
-! CHECK:               %[[VAL_27:.*]] = arith.constant 0 : index
-! CHECK:               %[[VAL_28:.*]] = fir.embox %[[VAL_26]] typeparams %[[VAL_27]] : (!fir.heap<!fir.char<1,?>>, index) -> !fir.box<!fir.heap<!fir.char<1,?>>>
-! CHECK:               fir.store %[[VAL_28]] to %[[ORIG_REF]] : !fir.ref<!fir.box<!fir.heap<!fir.char<1,?>>>>
+! CHECK:               fir.freemem %[[VAL_20]] : !fir.heap<!fir.char<1,?>>
 !CHECK:    }
 !CHECK:    omp.yield
 !CHECK:  }
diff --git a/flang/test/Lower/OpenMP/parallel-private-clause-str.f90 b/flang/test/Lower/OpenMP/parallel-private-clause-str.f90
index 70cb4a9fde3bd2..44cb08e029aa11 100644
--- a/flang/test/Lower/OpenMP/parallel-private-clause-str.f90
+++ b/flang/test/Lower/OpenMP/parallel-private-clause-str.f90
@@ -8,45 +8,38 @@
 ! RUN: %flang_fc1 -emit-hlfir -fopenmp -o - %s 2>&1 \
 ! RUN: | FileCheck %s
 
-!CHECK:  omp.private {type = private} @[[STR_ARR_PRIVATIZER:_QFtest_allocatable_string_arrayEc_private_ref_box_heap_Uxc8xU]] : !fir.ref<!fir.box<!fir.heap<!fir.array<?x!fir.char<1,?>>>>> alloc {
-!CHECK:  ^bb0(%[[ORIG_REF:.*]]: !fir.ref<!fir.box<!fir.heap<!fir.array<?x!fir.char<1,?>>>>>):
-!CHECK:      %[[C_PVT_BOX_REF:.*]] = fir.alloca !fir.box<!fir.heap<!fir.array<?x!fir.char<1,?>>>> {bindc_name = "c", pinned, uniq_name = "_QFtest_allocatable_string_arrayEc"}
+!CHECK:  omp.private {type = private} @[[STR_ARR_PRIVATIZER:_QFtest_allocatable_string_arrayEc_private_box_heap_Uxc8xU]] : [[TYPE:.*]] init {
+!CHECK:  ^bb0(%[[ORIG_REF:.*]]: !fir.ref<[[TYPE]]>, %[[C_PVT_BOX_REF:.*]]: !fir.ref<[[TYPE]]>):
 !CHECK:      %{{.*}} = fir.load %[[ORIG_REF]] : !fir.ref<!fir.box<!fir.heap<!fir.array<?x!fir.char<1,?>>>>>
 !CHECK:      fir.if %{{.*}} {
-!CHECK:        %[[C_PVT_ALLOC:.*]] = fir.allocmem !fir.array<?x!fir.char<1,?>>(%{{.*}} : index), %{{.*}} {fir.must_be_heap = true, uniq_name = "_QFtest_allocatable_string_arrayEc.alloc"}
-!CHECK:        %[[C_PVT_BOX:.*]] = fir.embox %[[C_PVT_ALLOC]](%{{.*}}) typeparams %{{.*}} : (!fir.heap<!fir.array<?x!fir.char<1,?>>>, !fir.shapeshift<1>, index) -> !fir.box<!fir.heap<!fir.array<?x!fir.char<1,?>>>>
+!CHECK:      } else {
+!CHECK:        %[[C_PVT_ALLOC:.*]] = fir.allocmem !fir.array<?x!fir.char<1,?>>(%{{.*}} : index), %{{.*}}
+!CHECK:        %[[C_PVT_BOX:.*]] = fir.rebox
 !CHECK:        fir.store %[[C_PVT_BOX]] to %[[C_PVT_BOX_REF]] : !fir.ref<!fir.box<!fir.heap<!fir.array<?x!fir.char<1,?>>>>>
 !CHECK:      }
-!CHECK:      %[[C_PVT_DECL:.*]]:2 = hlfir.declare %[[C_PVT_BOX_REF]] {fortran_attrs = #fir.var_attrs<allocatable>, uniq_name = "_QFtest_allocatable_string_arrayEc"} : (!fir.ref<!fir.box<!fir.heap<!fir.array<?x!fir.char<1,?>>>>>) -> (!fir.ref<!fir.box<!fir.heap<!fir.array<?x!fir.char<1,?>>>>>, !fir.ref<!fir.box<!fir.heap<!fir.array<?x!fir.char<1,?>>>>>)
-!CHECK:      omp.yield(%[[C_PVT_DECL]]#0 : !fir.ref<!fir.box<!fir.heap<!fir.array<?x!fir.char<1,?>>>>>)
+!CHECK:      omp.yield(%[[C_PVT_BOX_REF]] : !fir.ref<!fir.box<!fir.heap<!fir.array<?x!fir.char<1,?>>>>>)
 !CHECK:  } dealloc {
 !CHECK:  ^bb0(%[[ORIG_REF:.*]]: !fir.ref<!fir.box<!fir.heap<!fir.array<?x!fir.char<1,?>>>>>):
 !CHECK:      %{{.*}} = fir.load %[[ORIG_REF]] : !fir.ref<!fir.box<!fir.heap<!fir.array<?x!fir.char<1,?>>>>>
 !CHECK:      fir.if %{{.*}} {
-!CHECK:        %[[C_PVT_BOX:.*]] = fir.load %[[ORIG_REF]] : !fir.ref<!fir.box<!fir.heap<!fir.array<?x!fir.char<1,?>>>>>
-!CHECK:        %[[C_PVT_ADDR:.*]] = fir.box_addr %[[C_PVT_BOX]] : (!fir.box<!fir.heap<!fir.array<?x!fir.char<1,?>>>>) -> !fir.heap<!fir.array<?x!fir.char<1,?>>>
-!CHECK:        fir.freemem %[[C_PVT_ADDR]] : !fir.heap<!fir.array<?x!fir.char<1,?>>>
+!CHECK:        fir.freemem %{{.*}} : !fir.heap<!fir.array<?x!fir.char<1,?>>>
 !CHECK:      }
 !CHECK:      omp.yield
 !CHECK:  }
 
-!CHECK:  omp.private {type = private} @[[STR_PRIVATIZER:_QFtest_allocatable_stringEc_private_ref_box_heap_c8xU]] : !fir.ref<!fir.box<!fir.heap<!fir.char<1,?>>>> alloc {
-!CHECK:  ^bb0(%[[ORIG_REF:.*]]: !fir.ref<!fir.box<!fir.heap<!fir.char<1,?>>>>):
-!CHECK:      %[[C_PVT_BOX_REF:.*]] = fir.alloca !fir.box<!fir.heap<!fir.char<1,?>>> {bindc_name = "c", pinned, uniq_name = "_QFtest_allocatable_stringEc"}
+!CHECK:  omp.private {type = private} @[[STR_PRIVATIZER:_QFtest_allocatable_stringEc_private_box_heap_c8xU]] : [[TYPE:!fir.box<!fir.heap<!fir.char<1,\?>>>]] init {
+!CHECK:  ^bb0(%[[ORIG_REF:.*]]: !fir.ref<[[TYPE]]>, %[[C_PVT_BOX_REF:.*]]: !fir.ref<[[TYPE]]>):
 !CHECK:      %[[C_BOX:.*]] = fir.load %[[ORIG_REF]] : !fir.ref<!fir.box<!fir.heap<!fir.char<1,?>>>>
 !CHECK:      fir.if %{{.*}} {
-!CHECK:        %[[C_PVT_MEM:.*]] = fir.allocmem !fir.char<1,?>(%{{.*}} : index) {fir.must_be_heap = true, uniq_name = "_QFtest_allocatable_stringEc.alloc"}
+!CHECK:        %[[C_PVT_MEM:.*]] = fir.allocmem !fir.char<1,?>(%{{.*}} : index)
 !CHECK:        %[[C_PVT_BOX:.*]] = fir.embox %[[C_PVT_MEM]] typeparams %{{.*}} : (!fir.heap<!fir.char<1,?>>, index) -> !fir.box<!fir.heap<!fir.char<1,?>>>
 !CHECK:        fir.store %[[C_PVT_BOX]] to %[[C_PVT_BOX_REF]] : !fir.ref<!fir.box<!fir.heap<!fir.char<1,?>>>>
 !CHECK:      }
-!CHECK:      %[[C_PVT_DECL:.*]]:2 = hlfir.declare %[[C_PVT_BOX_REF]] {fortran_attrs = #fir.var_attrs<allocatable>, uniq_name = "_QFtest_allocatable_stringEc"} : (!fir.ref<!fir.box<!fir.heap<!fir.char<1,?>>>>) -> (!fir.ref<!fir.box<!fir.heap<!fir.char<1,?>>>>, !fir.ref<!fir.box<!fir.heap<!fir.char<1,?>>>>)
-!CHECK:      omp.yield(%[[C_PVT_DECL]]#0 : !fir.ref<!fir.box<!fir.heap<!fir.char<1,?>>>>)
+!CHECK:      omp.yield(%[[C_PVT_BOX_REF]] : !fir.ref<!fir.box<!fir.heap<!fir.char<1,?>>>>)
 !CHECK:  } dealloc {
 !CHECK:  ^bb0(%[[ORIG_REF:.*]]: !fir.ref<!fir.box<!fir.heap<!fir.char<1,?>>>>):
 !CHECK:      fir.if %{{.*}} {
-!CHECK:        %[[C_PVT_BOX:.*]] = fir.load %[[ORIG_REF]] : !fir.ref<!fir.box<!fir.heap<!fir.char<1,?>>>>
-!CHECK:        %[[C_PVT_BOX_ADDR:.*]] = fir.box_addr %[[C_PVT_BOX]] : (!fir.box<!fir.heap<!fir.char<1,?>>>) -> !fir.heap<!fir.char<1,?>>
-!CHECK:        fir.freemem %[[C_PVT_BOX_ADDR]] : !fir.heap<!fir.char<1,?>>
+!CHECK:        fir.freemem %{{.*}} : !fir.heap<!fir.char<1,?>>
 !CHECK:      }
 !CHECK:      omp.yield
 !CHECK:  }
diff --git a/flang/test/Lower/OpenMP/private-commonblock.f90 b/flang/test/Lower/OpenMP/private-commonblock.f90
index f6d285a3b011e3..84a604cf10992d 100644
--- a/flang/test/Lower/OpenMP/private-commonblock.f90
+++ b/flang/test/Lower/OpenMP/private-commonblock.f90
@@ -17,6 +17,8 @@ subroutine private_common
   !$omp end parallel
 end subroutine
 
+!CHECK:    %[[D_BOX_ADDR:.*]] = fir.alloca !fir.box<!fir.array<5x!fir.char<1,5>>>
+!CHECK:    %[[B_BOX_ADDR:.*]] = fir.alloca !fir.box<!fir.array<10xf32>>
 !CHECK:    %[[BLK_ADDR:.*]] = fir.address_of(@blk_) : !fir.ref<!fir.array<74xi8>>
 !CHECK:    %[[I8_ARR:.*]] = fir.convert %[[BLK_ADDR]] : (!fir.ref<!fir.array<74xi8>>) -> !fir.ref<!fir.array<?xi8>>
 !CHECK:    %[[C0:.*]] = arith.constant 0 : index
@@ -48,17 +50,24 @@ subroutine private_common
 !CHECK:    %[[D_REF:.*]] = fir.convert %[[D_DECL]]#1 : (!fir.ref<!fir.array<5x!fir.char<1,5>>>) -> !fir.ref<!fir.char<1,5>>
 !CHECK:    %[[D_BOX:.*]] = fir.emboxchar %[[D_REF]], %[[TP5]] : (!fir.ref<!fir.char<1,5>>, index) -> !fir.boxchar<1>
 !CHECK:    fir.call @_QPsub1(%[[A_DECL]]#1, %[[B_DECL]]#1, %[[C_BOX]], %[[D_BOX]]) fastmath<contract> : (!fir.ref<i32>, !fir.ref<!fir.array<10xf32>>, !fir.boxchar<1>, !fir.boxchar<1>) -> ()
-!CHECK:    omp.parallel private(@{{.*}} %{{.*}}#0 -> %[[A_PVT_REF:.*]], @{{.*}} %{{.*}}#0 -> %[[B_PVT_REF:.*]], @{{.*}} %{{.*}}#0 -> %[[C_PVT_REF:.*]], @{{.*}} %{{.*}}#0 -> %[[D_PVT_REF:.*]] : {{.*}}) {
+!CHECK:    %[[B_BOX:.*]] = fir.embox %[[B_DECL]]#0(%[[SH10]])
+!CHECK:    fir.store %[[B_BOX]] to %[[B_BOX_ADDR]]
+!CHECK:    %[[D_BOX:.*]] = fir.embox %[[D_DECL]]#0(%[[SH5]])
+!CHECK:    fir.store %[[D_BOX]] to %[[D_BOX_ADDR]]
+!CHECK:    omp.parallel private(@{{.*}} %{{.*}}#0 -> %[[A_PVT_REF:.*]], @{{.*}} %[[B_BOX_ADDR]] -> %[[B_PVT_REF:.*]], @{{.*}} %{{.*}}#0 -> %[[C_PVT_REF:.*]], @{{.*}} %[[D_BOX_ADDR]] -> %[[D_PVT_REF:.*]] : {{.*}}) {
 !CHECK:      %[[A_PVT_DECL:.*]]:2 = hlfir.declare %[[A_PVT_REF]] {uniq_name = "_QFprivate_clause_commonblockEa"} : (!fir.ref<i32>) -> (!fir.ref<i32>, !fir.ref<i32>)
-!CHECK:      %[[SH10:.*]] = fir.shape %c10{{.*}} : (index) -> !fir.shape<1>
-!CHECK:      %[[B_PVT_DECL:.*]]:2 = hlfir.declare %[[B_PVT_REF]](%[[SH10]]) {uniq_name = "_QFprivate_clause_commonblockEb"} : (!fir.ref<!fir.array<10xf32>>, !fir.shape<1>) -> (!fir.ref<!fir.array<10xf32>>, !fir.ref<!fir.array<10xf32>>)
+!CHECK:      %[[B_PVT_DECL:.*]]:2 = hlfir.declare %[[B_PVT_REF]] {uniq_name = "_QFprivate_clause_commonblockEb"} :
 !CHECK:      %[[C_PVT_DECL:.*]]:2 = hlfir.declare %[[C_PVT_REF]] typeparams %{{.*}} {uniq_name = "_QFprivate_clause_commonblockEc"} : (!fir.ref<!fir.char<1,5>>, index) -> (!fir.ref<!fir.char<1,5>>, !fir.ref<!fir.char<1,5>>)
-!CHECK:      %[[SH5:.*]] = fir.shape %c5{{.*}} : (index) -> !fir.shape<1>
-!CHECK:      %[[D_PVT_DECL:.*]]:2 = hlfir.declare %[[D_PVT_REF]](%[[SH5]]) typeparams %c5{{.*}} {uniq_name = "_QFprivate_clause_commonblockEd"} : (!fir.ref<!fir.array<5x!fir.char<1,5>>>, !fir.shape<1>, index) -> (!fir.ref<!fir.array<5x!fir.char<1,5>>>, !fir.ref<!fir.array<5x!fir.char<1,5>>>)
+!CHECK:      %[[D_PVT_DECL:.*]]:2 = hlfir.declare %[[D_PVT_REF]]
+!CHECK:      %[[B_LOADED:.*]] = fir.load %[[B_PVT_DECL]]#0
+!CHECK:      %[[B_ADDR:.*]] = fir.box_addr %[[B_LOADED]]
 !CHECK:      %[[C_PVT_BOX:.*]] = fir.emboxchar %[[C_PVT_DECL]]#1, %{{.*}} : (!fir.ref<!fir.char<1,5>>, index) -> !fir.boxchar<1>
-!CHECK:      %[[D_PVT_REF:.*]] = fir.convert %[[D_PVT_DECL]]#1 : (!fir.ref<!fir.array<5x!fir.char<1,5>>>) -> !fir.ref<!fir.char<1,5>>
+
+!CHECK:      %[[D_LOADED:.*]] = fir.load %[[D_PVT_DECL]]#0
+!CHECK:      %[[D_ADDR:.*]] = fir.box_addr %[[D_LOADED]]
+!CHECK:      %[[D_PVT_REF:.*]] = fir.convert %[[D_ADDR]] : (!fir.ref<!fir.array<5x!fir.char<1,5>>>) -> !fir.ref<!fir.char<1,5>>
 !CHECK:      %[[D_PVT_BOX:.*]] = fir.emboxchar %[[D_PVT_REF]], %{{.*}} : (!fir.ref<!fir.char<1,5>>, index) -> !fir.boxchar<1>
-!CHECK:      fir.call @_QPsub2(%[[A_PVT_DECL]]#1, %[[B_PVT_DECL]]#1, %[[C_PVT_BOX]], %[[D_PVT_BOX]]) fastmath<contract> : (!fir.ref<i32>, !fir.ref<!fir.array<10xf32>>, !fir.boxchar<1>, !fir.boxchar<1>) -> ()
+!CHECK:      fir.call @_QPsub2(%[[A_PVT_DECL]]#1, %[[B_ADDR]], %[[C_PVT_BOX]], %[[D_PVT_BOX]]) fastmath<contract> : (!fir.ref<i32>, !fir.ref<!fir.array<10xf32>>, !fir.boxchar<1>, !fir.boxchar<1>) -> ()
 !CHECK:      omp.terminator
 !CHECK:    }
 !CHECK:    %[[C_BOX:.*]] = fir.emboxchar %[[C_DECL]]#1, %{{.*}} : (!fir.ref<!fir.char<1,5>>, index) -> !fir.boxchar<1>

>From 2eb72ceb4528778bc0f15f73749a5c89d4d225f0 Mon Sep 17 00:00:00 2001
From: Tom Eccles <tom.eccles at arm.com>
Date: Mon, 13 Jan 2025 12:22:47 +0000
Subject: [PATCH 04/12] Fix omp target maps for privatized symbols

---
 .../OpenMP/MapsForPrivatizedSymbols.cpp       | 12 ++-
 .../target-private-multiple-variables.f90     | 98 ++++++++-----------
 2 files changed, 47 insertions(+), 63 deletions(-)

diff --git a/flang/lib/Optimizer/OpenMP/MapsForPrivatizedSymbols.cpp b/flang/lib/Optimizer/OpenMP/MapsForPrivatizedSymbols.cpp
index c990bebcabde42..5d44dcd042899f 100644
--- a/flang/lib/Optimizer/OpenMP/MapsForPrivatizedSymbols.cpp
+++ b/flang/lib/Optimizer/OpenMP/MapsForPrivatizedSymbols.cpp
@@ -55,15 +55,19 @@ class MapsForPrivatizedSymbolsPass
         std::underlying_type_t<llvm::omp::OpenMPOffloadMappingFlags>>(
         llvm::omp::OpenMPOffloadMappingFlags::OMP_MAP_TO);
     Operation *definingOp = var.getDefiningOp();
-    auto declOp = llvm::dyn_cast_or_null<hlfir::DeclareOp>(definingOp);
-    assert(declOp &&
-           "Expected defining Op of privatized var to be hlfir.declare");
+    assert(definingOp &&
+           "Privatizing a block argument without any hlfir.declare");
 
+    Value varPtr = var;
     // We want the first result of the hlfir.declare op because our goal
     // is to map the descriptor (fir.box or fir.boxchar) and the first
     // result for hlfir.declare is the descriptor if a the symbol being
     // decalred needs a descriptor.
-    Value varPtr = declOp.getBase();
+    // Some types are boxed immediately before privatization. These have other
+    // operations in between the privatization and the declaration. It is safe
+    // to use var directly here because they will be boxed anyay.
+    if (auto declOp = llvm::dyn_cast<hlfir::DeclareOp>(definingOp))
+      varPtr = declOp.getBase();
 
     // If we do not have a reference to descritor, but the descriptor itself
     // then we need to store that on the stack so that we can map the
diff --git a/flang/test/Lower/OpenMP/DelayedPrivatization/target-private-multiple-variables.f90 b/flang/test/Lower/OpenMP/DelayedPrivatization/target-private-multiple-variables.f90
index f3f9bbe4a76a28..5d31de10d74f87 100644
--- a/flang/test/Lower/OpenMP/DelayedPrivatization/target-private-multiple-variables.f90
+++ b/flang/test/Lower/OpenMP/DelayedPrivatization/target-private-multiple-variables.f90
@@ -38,95 +38,81 @@ end subroutine target_allocatable
 !
 ! CHECK:      omp.private {type = private}
 ! CHECK-SAME:   @[[CHAR_PRIVATIZER_SYM:[^[:space:]]+char_var[^[:space:]]+]]
-! CHECK-SAME:   : [[CHAR_TYPE:!fir.boxchar<1>]] alloc {
+! CHECK-SAME:   : [[CHAR_TYPE:!fir.boxchar<1>]] init {
 !
-! CHECK-NEXT: ^bb0(%[[PRIV_ARG:.*]]: [[CHAR_TYPE]]):
+! CHECK-NEXT: ^bb0(%[[PRIV_ARG:.*]]: [[CHAR_TYPE]], %[[UNUSED:.*]]: [[CHAR_TYPE]]):
 ! CHECK-NEXT:   %[[UNBOX:.*]]:2 = fir.unboxchar %[[PRIV_ARG]]
-! CHECK:        %[[PRIV_ALLOC:.*]] = fir.alloca !fir.char<1,?>(%[[UNBOX]]#1 : index)
-! CHECK-NEXT:   %[[PRIV_DECL:.*]]:2 = hlfir.declare %[[PRIV_ALLOC]] typeparams %[[UNBOX]]#1
-! CHECK-NEXT:   omp.yield(%[[PRIV_DECL]]#0 : [[CHAR_TYPE]])
-! CHECK-NEXT: }
+! CHECK:        %[[PRIV_ALLOC:.*]] = fir.allocmem !fir.char<1,?>(%[[UNBOX]]#1 : index)
+! CHECK:        %[[BOXCHAR:.*]] = fir.emboxchar %[[PRIV_ALLOC]], %[[UNBOX]]#1
+! CHECK-NEXT:   omp.yield(%[[BOXCHAR]] : [[CHAR_TYPE]])
+! CHECK-NEXT: } dealloc {
 
 ! Test the privatizer for `complex`
 !
 ! CHECK:      omp.private {type = private}
 ! CHECK-SAME:   @[[COMP_PRIVATIZER_SYM:[^[:space:]]+comp_var[^[:space:]]+]]
-! CHECK-SAME:   : [[COMP_TYPE:!fir.ref<complex<f32>>]] alloc {
-!
-! CHECK-NEXT: ^bb0(%[[PRIV_ARG:.*]]: [[COMP_TYPE]]):
-! CHECK-NEXT:   %[[PRIV_ALLOC:.*]] = fir.alloca complex<f32>
-! CHECK-NEXT:   %[[PRIV_DECL:.*]]:2 = hlfir.declare %[[PRIV_ALLOC]]
-! CHECK-NEXT:   omp.yield(%[[PRIV_DECL]]#0 : [[COMP_TYPE]])
-! CHECK-NEXT: }
+! CHECK-SAME:   : [[COMP_TYPE:complex<f32>]]{{$}}
 
 ! Test the privatizer for `real(:)`
 !
 ! CHECK:      omp.private {type = private}
 ! CHECK-SAME:   @[[ARR_PRIVATIZER_SYM:[^[:space:]]+real_arr[^[:space:]]+]]
-! CHECK-SAME:   : [[ARR_TYPE:!fir.box<!fir.array<\?xf32>>]] alloc {
-!
-! CHECK-NEXT: ^bb0(%[[PRIV_ARG:.*]]: [[ARR_TYPE]]):
-! CHECK:        %[[C0:.*]] = arith.constant 0 : index
-! CHECK-NEXT:   %[[DIMS:.*]]:3 = fir.box_dims %[[PRIV_ARG]], %[[C0]] : ([[ARR_TYPE]], index)
-! CHECK:        %[[PRIV_ALLOCA:.*]] = fir.alloca !fir.array<{{\?}}xf32>
-! CHECK-NEXT:   %[[SHAPE_SHIFT:.*]] = fir.shape_shift %[[DIMS]]#0, %[[DIMS]]#1
-! CHECK-NEXT:   %[[PRIV_DECL:.*]]:2 = hlfir.declare %[[PRIV_ALLOCA]](%[[SHAPE_SHIFT]])
-! CHECK-NEXT:  omp.yield(%[[PRIV_DECL]]#0 : [[ARR_TYPE]])
+! CHECK-SAME:   : [[ARR_TYPE:!fir.box<!fir.array<\?xf32>>]] init {
+!
+! CHECK-NEXT: ^bb0(%[[PRIV_ARG:.*]]: !fir.ref<[[ARR_TYPE]]>, %[[PRIV_ALLOC:.*]]: !fir.ref<[[ARR_TYPE]]>):
+! CHECK-NEXT:   %[[MOLD:.*]] = fir.load %[[PRIV_ARG]]
+! CHECK-NEXT:   %[[C0:.*]] = arith.constant 0 : index
+! CHECK-NEXT:   %[[BOX_DIMS:.*]]:3 = fir.box_dims %[[MOLD]], %[[C0]]
+! CHECK-NEXT:   %[[SHAPE:.*]] = fir.shape %[[BOX_DIMS]]#1
+! CHECK-NEXT:   %[[DATA_ALLOC:.*]] = fir.allocmem !fir.array<?xf32>, %[[BOX_DIMS]]#1
+! CHECK-NEXT:   %[[TRUE:.*]] = arith.constant true
+! CHECK-NEXT:   %[[DECL:.*]]:2 = hlfir.declare %[[DATA_ALLOC:.*]](%[[SHAPE]])
+! CHECK-NEXT:   %[[C0_2:.*]] = arith.constant 0 : index
+! CHECK-NEXT:   %[[BOX_DIMS_2:.*]]:3 = fir.box_dims %[[MOLD]], %[[C0_2]]
+! CHECK-NEXT:   %[[SHAPE_SHIFT:.*]] = fir.shape_shift %[[BOX_DIMS_2]]#0, %[[BOX_DIMS_2]]#1
+! CHECK-NEXT:   %[[BOX:.*]] = fir.rebox %[[DECL]]#0(%[[SHAPE_SHIFT]])
+! CHECK-NEXT:   fir.store %[[BOX]] to %[[PRIV_ALLOC]]
+! CHECK-NEXT:   omp.yield(%[[PRIV_ALLOC]] : !fir.ref<[[ARR_TYPE]]>)
 ! CHECK-NEXT: }
 
 ! Test the privatizer for `real(:)`'s lower bound
 !
 ! CHECK:      omp.private {type = private}
 ! CHECK-SAME:   @[[LB_PRIVATIZER_SYM:[^[:space:]]+lb[^[:space:]]+]]
-! CHECK-SAME:   : [[LB_TYPE:!fir.ref<i64>]] alloc {
-
-! CHECK-NEXT: ^bb0(%[[PRIV_ARG:.*]]: [[LB_TYPE]]):
-! CHECK-NEXT:   %[[PRIV_ALLOCA:.*]] = fir.alloca i64
-! CHECK-NEXT:   %[[PRIV_DECL:.*]]:2 = hlfir.declare %[[PRIV_ALLOCA]]
-! CHECK-NEXT:  omp.yield(%[[PRIV_DECL]]#0 : [[LB_TYPE]])
-! CHECK-NEXT: }
+! CHECK-SAME:   : [[LB_TYPE:i64]]{{$}}
 
 ! Test the privatizer for `real`
 !
 ! CHECK:      omp.private {type = private}
 ! CHECK-SAME:   @[[REAL_PRIVATIZER_SYM:[^[:space:]]+real_var[^[:space:]]+]]
-! CHECK-SAME:   : [[REAL_TYPE:!fir.ref<f32>]] alloc {
-
-! CHECK-NEXT: ^bb0(%[[PRIV_ARG:.*]]: [[REAL_TYPE]]):
-! CHECK-NEXT:   %[[PRIV_ALLOCA:.*]] = fir.alloca f32
-! CHECK-NEXT:   %[[PRIV_DECL:.*]]:2 = hlfir.declare %[[PRIV_ALLOCA]]
-! CHECK-NEXT:  omp.yield(%[[PRIV_DECL]]#0 : [[REAL_TYPE]])
-! CHECK-NEXT: }
+! CHECK-SAME:   : [[REAL_TYPE:f32]]{{$}}
 
 ! Test the privatizer for `allocatable`
 !
 ! CHECK:      omp.private {type = private}
 ! CHECK-SAME:   @[[ALLOC_PRIVATIZER_SYM:[^[:space:]]+alloc_var[^[:space:]]+]]
-! CHECK-SAME:   : [[ALLOC_TYPE:!fir.ref<!fir.box<!fir.heap<i32>>>]] alloc {
+! CHECK-SAME:   : [[ALLOC_TYPE:!fir.box<!fir.heap<i32>>]] init {
 !
-! CHECK-NEXT: ^bb0(%[[PRIV_ARG:.*]]: [[ALLOC_TYPE]]):
-! CHECK:        %[[PRIV_ALLOC:.*]] = fir.alloca !fir.box<!fir.heap<i32>>
+! CHECK-NEXT: ^bb0(%[[PRIV_ARG:.*]]: !fir.ref<[[ALLOC_TYPE]]>, %[[PRIV_ALLOC:.*]]: !fir.ref<[[ALLOC_TYPE]]>):
 ! CHECK-NEXT:   %[[PRIV_ARG_VAL:.*]] = fir.load %[[PRIV_ARG]] : !fir.ref<!fir.box<!fir.heap<i32>>>
 ! CHECK-NEXT:   %[[PRIV_ARG_BOX:.*]] = fir.box_addr %[[PRIV_ARG_VAL]] : (!fir.box<!fir.heap<i32>>) -> !fir.heap<i32>
 ! CHECK-NEXT:   %[[PRIV_ARG_ADDR:.*]] = fir.convert %[[PRIV_ARG_BOX]] : (!fir.heap<i32>) -> i64
 ! CHECK-NEXT:   %[[C0:.*]] = arith.constant 0 : i64
-! CHECK-NEXT:   %[[ALLOC_COND:.*]] = arith.cmpi ne, %[[PRIV_ARG_ADDR]], %[[C0]] : i64
+! CHECK-NEXT:   %[[ALLOC_COND:.*]] = arith.cmpi eq, %[[PRIV_ARG_ADDR]], %[[C0]] : i64
 !
 ! CHECK-NEXT:   fir.if %[[ALLOC_COND]] {
-! CHECK:          %[[PRIV_ALLOCMEM:.*]] = fir.allocmem i32 {fir.must_be_heap = true, {{.*}}}
+! CHECK-NEXT:     %[[ZERO_BOX:.*]] = fir.embox %[[PRIV_ARG_BOX]] : (!fir.heap<i32>) -> !fir.box<!fir.heap<i32>>
+! CHECK-NEXT:     fir.store %[[ZERO_BOX]] to %[[PRIV_ALLOC]] : !fir.ref<!fir.box<!fir.heap<i32>>>
+! CHECK-NEXT:   } else {
+! CHECK:          %[[PRIV_ALLOCMEM:.*]] = fir.allocmem i32
 ! CHECK-NEXT:     %[[PRIV_ALLOCMEM_BOX:.*]] = fir.embox %[[PRIV_ALLOCMEM]] : (!fir.heap<i32>) -> !fir.box<!fir.heap<i32>>
 ! CHECK-NEXT:     fir.store %[[PRIV_ALLOCMEM_BOX]] to %[[PRIV_ALLOC]] : !fir.ref<!fir.box<!fir.heap<i32>>>
-! CHECK-NEXT:   } else {
-! CHECK-NEXT:     %[[ZERO_BITS:.*]] = fir.zero_bits !fir.heap<i32>
-! CHECK-NEXT:     %[[ZERO_BOX:.*]] = fir.embox %[[ZERO_BITS]] : (!fir.heap<i32>) -> !fir.box<!fir.heap<i32>>
-! CHECK-NEXT:     fir.store %[[ZERO_BOX]] to %[[PRIV_ALLOC]] : !fir.ref<!fir.box<!fir.heap<i32>>>
 ! CHECK-NEXT:   }
 !
-! CHECK-NEXT:   %[[PRIV_DECL:.*]]:2 = hlfir.declare %[[PRIV_ALLOC]]
-! CHECK-NEXT:   omp.yield(%[[PRIV_DECL]]#0 : [[ALLOC_TYPE]])
+! CHECK-NEXT:   omp.yield(%[[PRIV_ALLOC]] : !fir.ref<[[ALLOC_TYPE]]>)
 !
 ! CHECK-NEXT: } dealloc {
-! CHECK-NEXT: ^bb0(%[[PRIV_ARG:.*]]: [[ALLOC_TYPE]]):
+! CHECK-NEXT: ^bb0(%[[PRIV_ARG:.*]]: !fir.ref<[[ALLOC_TYPE]]>):
 !
 ! CHECK-NEXT:   %[[PRIV_VAL:.*]] = fir.load %[[PRIV_ARG]]
 ! CHECK-NEXT:   %[[PRIV_ADDR:.*]] = fir.box_addr %[[PRIV_VAL]]
@@ -135,12 +121,7 @@ end subroutine target_allocatable
 ! CHECK-NEXT:   %[[PRIV_NULL_COND:.*]] = arith.cmpi ne, %[[PRIV_ADDR_I64]], %[[C0]] : i64
 !
 ! CHECK-NEXT:   fir.if %[[PRIV_NULL_COND]] {
-! CHECK:          %[[PRIV_VAL_2:.*]] = fir.load %[[PRIV_ARG]]
-! CHECK-NEXT:     %[[PRIV_ADDR_2:.*]] = fir.box_addr %[[PRIV_VAL_2]]
-! CHECK-NEXT:     fir.freemem %[[PRIV_ADDR_2]]
-! CHECK-NEXT:     %[[ZEROS:.*]] = fir.zero_bits
-! CHECK-NEXT:     %[[ZEROS_BOX:.*]]  = fir.embox %[[ZEROS]]
-! CHECK-NEXT:     fir.store %[[ZEROS_BOX]] to %[[PRIV_ARG]]
+! CHECK-NEXT:     fir.freemem %[[PRIV_ADDR]]
 ! CHECK-NEXT:   }
 !
 ! CHECK-NEXT:   omp.yield
@@ -157,9 +138,9 @@ end subroutine target_allocatable
 ! CHECK:        %[[CHAR_VAR_DECL:.*]]:2 = hlfir.declare %[[CHAR_VAR_ALLOC]] typeparams
 ! CHECK:        %[[REAL_ARR_ALLOC:.*]] = fir.alloca !fir.array<?xf32>, {{.*}} {bindc_name = "real_arr", {{.*}}}
 ! CHECK:        %[[REAL_ARR_DECL:.*]]:2 = hlfir.declare %[[REAL_ARR_ALLOC]]({{.*}})
+! CHECK:        fir.store %[[REAL_ARR_DECL]]#0 to %[[REAL_ARR_DESC_ALLOCA]] : !fir.ref<!fir.box<!fir.array<?xf32>>>
 ! CHECK:        %[[MAPPED_MI0:.*]] = omp.map.info var_ptr(%[[MAPPED_DECL]]#1 : !fir.ref<i32>, i32) {{.*}}
 ! CHECK:        %[[ALLOC_VAR_MAP:.*]] = omp.map.info var_ptr(%[[ALLOC_VAR_DECL]]#0 : !fir.ref<!fir.box<!fir.heap<i32>>>, !fir.box<!fir.heap<i32>>)
-! CHECK:        fir.store %[[REAL_ARR_DECL]]#0 to %[[REAL_ARR_DESC_ALLOCA]] : !fir.ref<!fir.box<!fir.array<?xf32>>>
 ! CHECK:        %[[REAL_ARR_DESC_MAP:.*]] = omp.map.info var_ptr(%[[REAL_ARR_DESC_ALLOCA]] : !fir.ref<!fir.box<!fir.array<?xf32>>>, !fir.box<!fir.array<?xf32>>)
 ! CHECK:        fir.store %[[CHAR_VAR_DECL]]#0 to %[[CHAR_VAR_DESC_ALLOCA]] : !fir.ref<!fir.boxchar<1>>
 ! CHECK:        %[[CHAR_VAR_DESC_MAP:.*]] = omp.map.info var_ptr(%[[CHAR_VAR_DESC_ALLOCA]] : !fir.ref<!fir.boxchar<1>>, !fir.boxchar<1>)
@@ -174,16 +155,15 @@ end subroutine target_allocatable
 ! CHECK-SAME:       @[[ALLOC_PRIVATIZER_SYM]] %{{[^[:space:]]+}}#0 -> %[[ALLOC_ARG:[^,]+]] [map_idx=1],
 ! CHECK-SAME:       @[[REAL_PRIVATIZER_SYM]] %{{[^[:space:]]+}}#0 -> %[[REAL_ARG:[^,]+]],
 ! CHECK-SAME:       @[[LB_PRIVATIZER_SYM]] %{{[^[:space:]]+}}#0 -> %[[LB_ARG:[^,]+]],
-! CHECK-SAME:       @[[ARR_PRIVATIZER_SYM]] %{{[^[:space:]]+}}#0 -> %[[ARR_ARG:[^,]+]] [map_idx=2],
+! CHECK-SAME:       @[[ARR_PRIVATIZER_SYM]] %{{[^[:space:]]+}} -> %[[ARR_ARG:[^,]+]] [map_idx=2],
 ! CHECK-SAME:       @[[COMP_PRIVATIZER_SYM]] %{{[^[:space:]]+}}#0 -> %[[COMP_ARG:[^,]+]],
 ! CHECK-SAME:       @[[CHAR_PRIVATIZER_SYM]] %{{[^[:space:]]+}}#0 -> %[[CHAR_ARG:[^,]+]] [map_idx=3] :
-! CHECK-SAME:       !fir.ref<!fir.box<!fir.heap<i32>>>, !fir.ref<f32>, !fir.ref<i64>, !fir.box<!fir.array<?xf32>>, !fir.ref<complex<f32>>, !fir.boxchar<1>) {
+! CHECK-SAME:       !fir.ref<!fir.box<!fir.heap<i32>>>, !fir.ref<f32>, !fir.ref<i64>, !fir.ref<!fir.box<!fir.array<?xf32>>>, !fir.ref<complex<f32>>, !fir.boxchar<1>) {
 ! CHECK-NOT:      fir.alloca
 ! CHECK:          hlfir.declare %[[ALLOC_ARG]]
 ! CHECK:          hlfir.declare %[[REAL_ARG]]
 ! CHECK:          hlfir.declare %[[LB_ARG]]
-! CHECK:          %[[ARR_ARG_ADDR:.*]] = fir.box_addr %[[ARR_ARG]]
-! CHECK:          hlfir.declare %[[ARR_ARG_ADDR]]
+! CHECK:          hlfir.declare %[[ARR_ARG]]
 ! CHECK:          hlfir.declare %[[COMP_ARG]]
 ! CHECK:          %[[CHAR_ARG_UNBOX:.*]]:2 = fir.unboxchar %[[CHAR_ARG]]
 ! CHECK:          hlfir.declare %[[CHAR_ARG_UNBOX]]

>From dc820bb1d9a7f32414a73248d702eeea956641f2 Mon Sep 17 00:00:00 2001
From: Tom Eccles <tom.eccles at arm.com>
Date: Wed, 11 Dec 2024 14:13:16 +0000
Subject: [PATCH 05/12] Support pointers

The copy region codegen hasn't changed as a result of this patch series.
However I think there is a bug in the copy region generated in
equivalence.f90. This patch series is already too big so I won't change
the copy region here.
---
 .../lib/Lower/OpenMP/DataSharingProcessor.cpp | 12 ++++---
 .../Lower/OpenMP/PrivateReductionUtils.cpp    | 31 ++++++++++++++-----
 .../lib/Lower/OpenMP/PrivateReductionUtils.h  | 15 +++++----
 .../DelayedPrivatization/equivalence.f90      | 16 +++++-----
 .../OpenMP/delayed-privatization-pointer.f90  | 12 +++----
 5 files changed, 52 insertions(+), 34 deletions(-)

diff --git a/flang/lib/Lower/OpenMP/DataSharingProcessor.cpp b/flang/lib/Lower/OpenMP/DataSharingProcessor.cpp
index 25afcb0b325e3d..fce2b7fbd4cd25 100644
--- a/flang/lib/Lower/OpenMP/DataSharingProcessor.cpp
+++ b/flang/lib/Lower/OpenMP/DataSharingProcessor.cpp
@@ -504,14 +504,16 @@ void DataSharingProcessor::doPrivatize(const semantics::Symbol *sym,
   assert(hsb && "Host symbol box not found");
 
   mlir::Value privVal = hsb.getAddr();
-  mlir::Type allocType = fir::unwrapRefType(privVal.getType());
+  mlir::Type allocType;
+  if (mlir::isa<fir::PointerType>(privVal.getType()))
+    allocType = privVal.getType();
+  else
+    allocType = fir::unwrapRefType(privVal.getType());
+
   mlir::Location symLoc = hsb.getAddr().getLoc();
   std::string privatizerName = sym->name().ToString() + ".privatizer";
   bool isFirstPrivate = sym->test(semantics::Symbol::Flag::OmpFirstPrivate);
 
-  if (mlir::isa<fir::PointerType>(hsb.getAddr().getType()))
-    TODO(symLoc, "Privatization of pointers");
-
   if (auto poly = mlir::dyn_cast<fir::ClassType>(allocType)) {
     if (!mlir::isa<fir::PointerType>(poly.getEleTy()) && isFirstPrivate)
       TODO(symLoc, "create polymorphic host associated copy");
@@ -580,7 +582,7 @@ void DataSharingProcessor::doPrivatize(const semantics::Symbol *sym,
       populateByRefInitAndCleanupRegions(
           firOpBuilder, symLoc, argType, /*scalarInitValue=*/nullptr, initBlock,
           result.getInitPrivateArg(), result.getInitMoldArg(),
-          result.getDeallocRegion());
+          result.getDeallocRegion(), /*isPrivate=*/true);
     }
 
     // Populate the `copy` region if this is a `firstprivate`.
diff --git a/flang/lib/Lower/OpenMP/PrivateReductionUtils.cpp b/flang/lib/Lower/OpenMP/PrivateReductionUtils.cpp
index 0835030dbbeb43..8443bf11128b7c 100644
--- a/flang/lib/Lower/OpenMP/PrivateReductionUtils.cpp
+++ b/flang/lib/Lower/OpenMP/PrivateReductionUtils.cpp
@@ -119,7 +119,7 @@ void Fortran::lower::omp::populateByRefInitAndCleanupRegions(
     fir::FirOpBuilder &builder, mlir::Location loc, mlir::Type argType,
     mlir::Value scalarInitValue, mlir::Block *initBlock,
     mlir::Value allocatedPrivVarArg, mlir::Value moldArg,
-    mlir::Region &cleanupRegion) {
+    mlir::Region &cleanupRegion, bool isPrivate) {
   mlir::Type ty = fir::unwrapRefType(argType);
   builder.setInsertionPointToEnd(initBlock);
   auto yield = [&](mlir::Value ret) {
@@ -147,11 +147,10 @@ void Fortran::lower::omp::populateByRefInitAndCleanupRegions(
   //   fir.store %something to %box_alloca
   // }
   // omp.yield %box_alloca
-  moldArg = builder.loadIfRef(loc, moldArg);
   mlir::SmallVector<mlir::Value> lenParams;
-  hlfir::genLengthParameters(loc, builder, hlfir::Entity{moldArg}, lenParams);
-  auto handleNullAllocatable = [&](mlir::Value boxAlloca) -> fir::IfOp {
-    mlir::Value addr = builder.create<fir::BoxAddrOp>(loc, moldArg);
+  auto handleNullAllocatable = [&](mlir::Value boxAlloca,
+                                   mlir::Value loadedMold) -> fir::IfOp {
+    mlir::Value addr = builder.create<fir::BoxAddrOp>(loc, loadedMold);
     mlir::Value isNotAllocated = builder.genIsNullAddr(loc, addr);
     fir::IfOp ifOp = builder.create<fir::IfOp>(loc, isNotAllocated,
                                                /*withElseRegion=*/true);
@@ -171,6 +170,21 @@ void Fortran::lower::omp::populateByRefInitAndCleanupRegions(
 
     builder.setInsertionPointToEnd(initBlock);
     mlir::Value boxAlloca = allocatedPrivVarArg;
+
+    // The initial state of a private pointer is undefined so we don't need to
+    // match the mold argument (OpenMP 5.2 end of page 106).
+    if (isPrivate && mlir::isa<fir::PointerType>(boxTy.getEleTy())) {
+      // Just incase, do initialize the box with a null value
+      mlir::Value null = builder.createNullConstant(loc, boxTy.getEleTy());
+      mlir::Value nullBox = builder.create<fir::EmboxOp>(loc, boxTy, null);
+      builder.create<fir::StoreOp>(loc, nullBox, boxAlloca);
+      yield(boxAlloca);
+      return;
+    }
+
+    moldArg = builder.loadIfRef(loc, moldArg);
+    hlfir::genLengthParameters(loc, builder, hlfir::Entity{moldArg}, lenParams);
+
     mlir::Type innerTy = fir::unwrapRefType(boxTy.getEleTy());
     bool isChar = fir::isa_char(innerTy);
     if (fir::isa_trivial(innerTy) || isChar) {
@@ -179,7 +193,7 @@ void Fortran::lower::omp::populateByRefInitAndCleanupRegions(
         TODO(loc,
              "Reduction/Privatization of non-allocatable trivial typed box");
 
-      fir::IfOp ifUnallocated = handleNullAllocatable(boxAlloca);
+      fir::IfOp ifUnallocated = handleNullAllocatable(boxAlloca, moldArg);
 
       builder.setInsertionPointToStart(&ifUnallocated.getElseRegion().front());
       mlir::Value valAlloc = builder.createHeapTemporary(
@@ -200,9 +214,12 @@ void Fortran::lower::omp::populateByRefInitAndCleanupRegions(
     if (!innerTy || !mlir::isa<fir::SequenceType>(innerTy))
       TODO(loc, "Unsupported boxed type for reduction/privatization");
 
+    moldArg = builder.loadIfRef(loc, moldArg);
+    hlfir::genLengthParameters(loc, builder, hlfir::Entity{moldArg}, lenParams);
+
     fir::IfOp ifUnallocated{nullptr};
     if (isAllocatableOrPointer) {
-      ifUnallocated = handleNullAllocatable(boxAlloca);
+      ifUnallocated = handleNullAllocatable(boxAlloca, moldArg);
       builder.setInsertionPointToStart(&ifUnallocated.getElseRegion().front());
     }
 
diff --git a/flang/lib/Lower/OpenMP/PrivateReductionUtils.h b/flang/lib/Lower/OpenMP/PrivateReductionUtils.h
index b4abc40cd4b674..b81b00e1784789 100644
--- a/flang/lib/Lower/OpenMP/PrivateReductionUtils.h
+++ b/flang/lib/Lower/OpenMP/PrivateReductionUtils.h
@@ -31,14 +31,13 @@ namespace omp {
 
 /// Generate init and cleanup regions suitable for reduction or privatizer
 /// declarations. `scalarInitValue` may be nullptr if there is no default
-/// initialization (for privatization).
-void populateByRefInitAndCleanupRegions(fir::FirOpBuilder &builder,
-                                        mlir::Location loc, mlir::Type argType,
-                                        mlir::Value scalarInitValue,
-                                        mlir::Block *initBlock,
-                                        mlir::Value allocatedPrivVarArg,
-                                        mlir::Value moldArg,
-                                        mlir::Region &cleanupRegion);
+/// initialization (for privatization). If this is for a privatizer, set
+/// `isPrivate` to `true`.
+void populateByRefInitAndCleanupRegions(
+    fir::FirOpBuilder &builder, mlir::Location loc, mlir::Type argType,
+    mlir::Value scalarInitValue, mlir::Block *initBlock,
+    mlir::Value allocatedPrivVarArg, mlir::Value moldArg,
+    mlir::Region &cleanupRegion, bool isPrivate = false);
 
 /// Generate a fir::ShapeShift op describing the provided boxed array.
 fir::ShapeShiftOp getShapeShift(fir::FirOpBuilder &builder, mlir::Location loc,
diff --git a/flang/test/Lower/OpenMP/DelayedPrivatization/equivalence.f90 b/flang/test/Lower/OpenMP/DelayedPrivatization/equivalence.f90
index 2307c09513795f..721bfff012f148 100644
--- a/flang/test/Lower/OpenMP/DelayedPrivatization/equivalence.f90
+++ b/flang/test/Lower/OpenMP/DelayedPrivatization/equivalence.f90
@@ -13,13 +13,15 @@ subroutine private_common
   !$omp end parallel
 end subroutine
 
-! CHECK:  omp.private {type = firstprivate} @[[X_PRIVATIZER:.*]] : ![[X_TYPE:fir.ptr<f32>]] alloc {
-! CHECK:  ^bb0(%{{.*}}: ![[X_TYPE]]):
-! CHECK:    %[[PRIV_ALLOC:.*]] = fir.alloca f32 {bindc_name = "x", {{.*}}}
-! CHECK:    %[[PRIV_DECL:.*]]:2 = hlfir.declare %[[PRIV_ALLOC]] {{{.*}}} : (![[PRIV_TYPE:fir.ref<f32>]]) -> ({{.*}})
-! CHECK:    %[[PRIV_CONV:.*]] = fir.convert %[[PRIV_DECL]]#0 : (![[PRIV_TYPE]]) -> ![[X_TYPE]]
-! CHECK:    omp.yield(%[[PRIV_CONV]] : ![[X_TYPE]])
-! CHECK:  } copy {
+! TODO: the copy region for pointers is incorrect. OpenMP 5.2 says
+!
+! > If the original list item has the POINTER attribute, the new list items
+! > receive the same association status as the original list item
+!
+! Currently the original pointer is unconditionally loaded, which is undefined
+! behavior if that pointer is not associated.
+
+! CHECK:  omp.private {type = firstprivate} @[[X_PRIVATIZER:.*]] : ![[X_TYPE:fir.ptr<f32>]] copy {
 ! CHECK:  ^bb0(%[[ORIG_PTR:.*]]: ![[X_TYPE]], %[[PRIV_REF:.*]]: ![[X_TYPE]]):
 ! CHECK:    %[[ORIG_VAL:.*]] = fir.load %[[ORIG_PTR]] : !fir.ptr<f32>
 ! CHECK:    hlfir.assign %[[ORIG_VAL]] to %[[PRIV_REF]] : f32, ![[X_TYPE]]
diff --git a/flang/test/Lower/OpenMP/delayed-privatization-pointer.f90 b/flang/test/Lower/OpenMP/delayed-privatization-pointer.f90
index c96b0b49fd5307..9b6aab6b55d693 100644
--- a/flang/test/Lower/OpenMP/delayed-privatization-pointer.f90
+++ b/flang/test/Lower/OpenMP/delayed-privatization-pointer.f90
@@ -15,20 +15,18 @@ subroutine delayed_privatization_pointer
 end subroutine
 
 ! CHECK-LABEL: omp.private {type = firstprivate}
-! CHECK-SAME: @[[PRIVATIZER_SYM:.*]] : [[TYPE:!fir.ref<!fir.box<!fir.ptr<i32>>>]] alloc {
+! CHECK-SAME: @[[PRIVATIZER_SYM:.*]] : [[TYPE:!fir.box<!fir.ptr<i32>>]] init {
 
-! CHECK-NEXT: ^bb0(%[[PRIV_ARG:.*]]: [[TYPE]]):
+! CHECK-NEXT: ^bb0(%[[PRIV_ARG:.*]]: !fir.ref<[[TYPE]]>, %[[PRIV_ALLOC:.*]]: !fir.ref<[[TYPE]]>):
 
-! CHECK-NEXT:   %[[PRIV_ALLOC:.*]] = fir.alloca !fir.box<!fir.ptr<i32>> {bindc_name = "var1", pinned, uniq_name = "_QFdelayed_privatization_pointerEvar1"}
 ! CHECK-NEXT:   %[[NULL:.*]] = fir.zero_bits !fir.ptr<i32>
 ! CHECK-NEXT:   %[[INIT:.*]] = fir.embox %[[NULL]] : (!fir.ptr<i32>) -> !fir.box<!fir.ptr<i32>>
 ! CHECK-NEXT:   fir.store %[[INIT]] to %[[PRIV_ALLOC]] : !fir.ref<!fir.box<!fir.ptr<i32>>>
-! CHECK-NEXT:   %[[PRIV_DECL:.*]]:2 = hlfir.declare %[[PRIV_ALLOC]]
-! CHECK-NEXT:   omp.yield(%[[PRIV_DECL]]#0 : [[TYPE]])
+! CHECK-NEXT:   omp.yield(%[[PRIV_ALLOC]] : !fir.ref<[[TYPE]]>)
 
 ! CHECK-NEXT: } copy {
-! CHECK: ^bb0(%[[PRIV_ORIG_ARG:.*]]: [[TYPE]], %[[PRIV_PRIV_ARG:.*]]: [[TYPE]]):
+! CHECK: ^bb0(%[[PRIV_ORIG_ARG:.*]]: !fir.ref<[[TYPE]]>, %[[PRIV_PRIV_ARG:.*]]: !fir.ref<[[TYPE]]>):
 ! CHECK-NEXT:    %[[ORIG_BASE_VAL:.*]] = fir.load %[[PRIV_ORIG_ARG]]
  ! CHECK-NEXT:   fir.store %[[ORIG_BASE_VAL]] to %[[PRIV_PRIV_ARG]] : !fir.ref<!fir.box<!fir.ptr<i32>>>
-! CHECK-NEXT:   omp.yield(%[[PRIV_PRIV_ARG]] : [[TYPE]])
+! CHECK-NEXT:   omp.yield(%[[PRIV_PRIV_ARG]] : !fir.ref<[[TYPE]]>)
 ! CHECK-NEXT: }

>From 870af174eee0f862f515862d198b4ed299780de4 Mon Sep 17 00:00:00 2001
From: Tom Eccles <tom.eccles at arm.com>
Date: Tue, 14 Jan 2025 17:06:31 +0000
Subject: [PATCH 06/12] Fix crash lowering fir::EmboxOp whithout shape

This only affects POINTERs (which should not be initialized) and NULL
allocatables so the actual contents of the shape doesn't matter. This is
just so the embox operation is converted to LLVMIR correctly.
---
 .../Lower/OpenMP/PrivateReductionUtils.cpp    | 35 ++++++++++++++-
 ...elayed-privatization-allocatable-array.f90 |  4 +-
 .../parallel-reduction-allocatable-array.f90  |  4 +-
 .../parallel-reduction-pointer-array.f90      |  4 +-
 flang/test/Lower/OpenMP/pointer-to-array.f90  | 43 +++++++++++++++++++
 ...oop-reduction-allocatable-array-minmax.f90 |  8 +++-
 6 files changed, 91 insertions(+), 7 deletions(-)
 create mode 100644 flang/test/Lower/OpenMP/pointer-to-array.f90

diff --git a/flang/lib/Lower/OpenMP/PrivateReductionUtils.cpp b/flang/lib/Lower/OpenMP/PrivateReductionUtils.cpp
index 8443bf11128b7c..41c70debed3b4d 100644
--- a/flang/lib/Lower/OpenMP/PrivateReductionUtils.cpp
+++ b/flang/lib/Lower/OpenMP/PrivateReductionUtils.cpp
@@ -115,6 +115,25 @@ fir::ShapeShiftOp Fortran::lower::omp::getShapeShift(fir::FirOpBuilder &builder,
   return shapeShift;
 }
 
+static mlir::Value generateZeroShapeForRank(fir::FirOpBuilder &builder,
+                                            mlir::Location loc,
+                                            mlir::Value moldArg) {
+  mlir::Type moldVal = fir::unwrapRefType(moldArg.getType());
+  mlir::Type eleType = fir::dyn_cast_ptrOrBoxEleTy(moldVal);
+  fir::SequenceType seqTy =
+      mlir::dyn_cast_if_present<fir::SequenceType>(eleType);
+  if (!seqTy)
+    return nullptr;
+
+  unsigned rank = seqTy.getShape().size();
+  mlir::Value zero =
+      builder.createIntegerConstant(loc, builder.getIndexType(), 0);
+  mlir::SmallVector<mlir::Value> dims;
+  dims.resize(rank, zero);
+  mlir::Type shapeTy = fir::ShapeType::get(builder.getContext(), rank);
+  return builder.create<fir::ShapeOp>(loc, shapeTy, dims);
+}
+
 void Fortran::lower::omp::populateByRefInitAndCleanupRegions(
     fir::FirOpBuilder &builder, mlir::Location loc, mlir::Type argType,
     mlir::Value scalarInitValue, mlir::Block *initBlock,
@@ -156,8 +175,12 @@ void Fortran::lower::omp::populateByRefInitAndCleanupRegions(
                                                /*withElseRegion=*/true);
     builder.setInsertionPointToStart(&ifOp.getThenRegion().front());
     // just embox the null address and return
+    // we have to give the embox a shape so that the LLVM box structure has the
+    // right rank. This returns nullptr if the types don't match.
+    mlir::Value shape = generateZeroShapeForRank(builder, loc, moldArg);
+
     mlir::Value nullBox =
-        builder.create<fir::EmboxOp>(loc, ty, addr, /*shape=*/mlir::Value{},
+        builder.create<fir::EmboxOp>(loc, ty, addr, shape,
                                      /*slice=*/mlir::Value{}, lenParams);
     builder.create<fir::StoreOp>(loc, nullBox, boxAlloca);
     return ifOp;
@@ -174,9 +197,17 @@ void Fortran::lower::omp::populateByRefInitAndCleanupRegions(
     // The initial state of a private pointer is undefined so we don't need to
     // match the mold argument (OpenMP 5.2 end of page 106).
     if (isPrivate && mlir::isa<fir::PointerType>(boxTy.getEleTy())) {
+      // we need a shape with the right rank so that the embox op is lowered
+      // to an llvm struct of the right type. This returns nullptr if the types
+      // aren't right.
+      mlir::Value shape = generateZeroShapeForRank(builder, loc, moldArg);
       // Just incase, do initialize the box with a null value
       mlir::Value null = builder.createNullConstant(loc, boxTy.getEleTy());
-      mlir::Value nullBox = builder.create<fir::EmboxOp>(loc, boxTy, null);
+      mlir::Value nullBox;
+      if (shape)
+        nullBox = builder.create<fir::EmboxOp>(loc, boxTy, null, shape);
+      else
+        nullBox = builder.create<fir::EmboxOp>(loc, boxTy, null);
       builder.create<fir::StoreOp>(loc, nullBox, boxAlloca);
       yield(boxAlloca);
       return;
diff --git a/flang/test/Lower/OpenMP/delayed-privatization-allocatable-array.f90 b/flang/test/Lower/OpenMP/delayed-privatization-allocatable-array.f90
index da093b2e97ef5b..9b6dbabf0c6ffc 100644
--- a/flang/test/Lower/OpenMP/delayed-privatization-allocatable-array.f90
+++ b/flang/test/Lower/OpenMP/delayed-privatization-allocatable-array.f90
@@ -27,7 +27,9 @@ subroutine delayed_privatization_private(var1, l1)
 ! CHECK-NEXT:   %[[ALLOC_COND:.*]] = arith.cmpi eq, %[[PRIV_ARG_ADDR]], %[[C0]] : i64
 
 ! CHECK-NEXT:   fir.if %[[ALLOC_COND]] {
-! CHECK-NEXT:     %[[EMBOX_2:.*]] = fir.embox %[[PRIV_ARG_BOX]]
+! CHECK-NEXT:     %[[C0_2:.*]] = arith.constant 0 : index
+! CHECK-NEXT:     %[[SHAPE:.*]] = fir.shape %[[C0_2]]
+! CHECK-NEXT:     %[[EMBOX_2:.*]] = fir.embox %[[PRIV_ARG_BOX]](%[[SHAPE]])
 ! CHECK-NEXT:     fir.store %[[EMBOX_2]] to %[[PRIV_ALLOC]]
 ! CHECK-NEXT:   } else {
 ! CHECK-NEXT:     %[[C0:.*]] = arith.constant 0 : index
diff --git a/flang/test/Lower/OpenMP/parallel-reduction-allocatable-array.f90 b/flang/test/Lower/OpenMP/parallel-reduction-allocatable-array.f90
index dabd495d733b55..25dbb75c54a818 100644
--- a/flang/test/Lower/OpenMP/parallel-reduction-allocatable-array.f90
+++ b/flang/test/Lower/OpenMP/parallel-reduction-allocatable-array.f90
@@ -30,7 +30,9 @@ program reduce
 ! CHECK:           %[[C0_I64:.*]] = arith.constant 0 : i64
 ! CHECK:           %[[IS_NULL:.*]] = arith.cmpi eq, %[[ADDRI]], %[[C0_I64]] : i64
 ! CHECK:           fir.if %[[IS_NULL]] {
-! CHECK:             %[[NULL_BOX:.*]] = fir.embox %[[ADDR]] : (!fir.heap<!fir.array<?xi32>>) -> !fir.box<!fir.heap<!fir.array<?xi32>>>
+! CHECK:             %[[C0_INDEX:.*]] = arith.constant 0 : index
+! CHECK:             %[[SHAPE:.*]] = fir.shape %[[C0_INDEX]]
+! CHECK:             %[[NULL_BOX:.*]] = fir.embox %[[ADDR]](%[[SHAPE]]) : (!fir.heap<!fir.array<?xi32>>, !fir.shape<1>) -> !fir.box<!fir.heap<!fir.array<?xi32>>>
 ! CHECK:             fir.store %[[NULL_BOX]] to %[[ALLOC]] : !fir.ref<!fir.box<!fir.heap<!fir.array<?xi32>>>>
 ! CHECK:           } else {
 ! CHECK:             %[[VAL_3:.*]] = arith.constant 0 : index
diff --git a/flang/test/Lower/OpenMP/parallel-reduction-pointer-array.f90 b/flang/test/Lower/OpenMP/parallel-reduction-pointer-array.f90
index 1e07018a68877a..a22a8f693d8a25 100644
--- a/flang/test/Lower/OpenMP/parallel-reduction-pointer-array.f90
+++ b/flang/test/Lower/OpenMP/parallel-reduction-pointer-array.f90
@@ -31,7 +31,9 @@ program reduce
 ! CHECK:           %[[VAL_6:.*]] = arith.constant 0 : i64
 ! CHECK:           %[[VAL_7:.*]] = arith.cmpi eq, %[[VAL_5]], %[[VAL_6]] : i64
 ! CHECK:           fir.if %[[VAL_7]] {
-! CHECK:             %[[VAL_8:.*]] = fir.embox %[[VAL_4]] : (!fir.ptr<!fir.array<?xi32>>) -> !fir.box<!fir.ptr<!fir.array<?xi32>>>
+! CHECK:             %[[C0:.*]] = arith.constant 0 : index
+! CHECK:             %[[SHAPE:.*]] = fir.shape %[[C0]]
+! CHECK:             %[[VAL_8:.*]] = fir.embox %[[VAL_4]](%[[SHAPE]]) : (!fir.ptr<!fir.array<?xi32>>, !fir.shape<1>) -> !fir.box<!fir.ptr<!fir.array<?xi32>>>
 ! CHECK:             fir.store %[[VAL_8]] to %[[ALLOC]] : !fir.ref<!fir.box<!fir.ptr<!fir.array<?xi32>>>>
 ! CHECK:           } else {
 ! CHECK:             %[[VAL_9:.*]] = arith.constant 0 : index
diff --git a/flang/test/Lower/OpenMP/pointer-to-array.f90 b/flang/test/Lower/OpenMP/pointer-to-array.f90
new file mode 100644
index 00000000000000..1861b3907bcf07
--- /dev/null
+++ b/flang/test/Lower/OpenMP/pointer-to-array.f90
@@ -0,0 +1,43 @@
+! Regression test for crash compiling privatizer for a pointer to an array.
+! The crash was because the fir.embox was not given a shape but it needs one.
+
+!RUN: %flang_fc1 -emit-hlfir -fopenmp %s -o - | FileCheck %s
+
+! ALLOCATABLE case (2nd subroutine)
+!CHECK-LABEL: omp.private {type = firstprivate}
+!CHECK-SAME: @{{.*}} : !fir.box<!fir.heap<!fir.array<?x!fir.type<{{.*}}>>>> init {
+!CHECK:        if %{{.*}} {
+!CHECK:        %[[SHAPE:.*]] = fir.shape
+!CHECK:        %[[BOX:.*]] = fir.embox %{{.*}}(%[[SHAPE]])
+!CHECK:        } else {
+
+! POINTER case (1st subroutine)
+!CHECK-LABEL: omp.private {type = firstprivate}
+!CHECK-SAME: @{{.*}} : !fir.box<!fir.ptr<!fir.array<?x!fir.type<{{.*}}>>>> init {
+!CHECK:        %[[SHAPE:.*]] = fir.shape
+!CHECK:        %[[ADDR:.*]] = fir.zero_bits
+!CHECK:        %[[BOX:.*]] = fir.embox %[[ADDR]](%[[SHAPE]])
+
+subroutine pointer_to_array_derived
+  type t
+    integer :: i
+  end type
+  type(t), pointer :: a(:)
+  allocate(a(1))
+  a(1)%i = 2
+  !$omp parallel firstprivate(a)
+  if (a(1)%i/=2) stop 2
+  !$omp end parallel
+end subroutine
+
+subroutine allocatable_array_derived
+  type t
+    integer :: i
+  end type
+  type(t), allocatable :: a(:)
+  allocate(a(1))
+  a(1)%i = 2
+  !$omp parallel firstprivate(a)
+  if (a(1)%i/=2) stop 2
+  !$omp end parallel
+end subroutine
diff --git a/flang/test/Lower/OpenMP/wsloop-reduction-allocatable-array-minmax.f90 b/flang/test/Lower/OpenMP/wsloop-reduction-allocatable-array-minmax.f90
index ce45d09d77a22a..f0daef1a4a3503 100644
--- a/flang/test/Lower/OpenMP/wsloop-reduction-allocatable-array-minmax.f90
+++ b/flang/test/Lower/OpenMP/wsloop-reduction-allocatable-array-minmax.f90
@@ -44,7 +44,9 @@ program reduce15
 ! CHECK:           %[[VAL_6:.*]] = arith.constant 0 : i64
 ! CHECK:           %[[VAL_7:.*]] = arith.cmpi eq, %[[VAL_5]], %[[VAL_6]] : i64
 ! CHECK:           fir.if %[[VAL_7]] {
-! CHECK:             %[[VAL_8:.*]] = fir.embox %[[VAL_4]] : (!fir.heap<!fir.array<?xi32>>) -> !fir.box<!fir.heap<!fir.array<?xi32>>>
+! CHECK:             %[[C0:.*]] = arith.constant 0 : index
+! CHECK:             %[[SHAPE:.*]] = fir.shape %[[C0]]
+! CHECK:             %[[VAL_8:.*]] = fir.embox %[[VAL_4]](%[[SHAPE]]) : (!fir.heap<!fir.array<?xi32>>, !fir.shape<1>) -> !fir.box<!fir.heap<!fir.array<?xi32>>>
 ! CHECK:             fir.store %[[VAL_8]] to %[[ALLOC]] : !fir.ref<!fir.box<!fir.heap<!fir.array<?xi32>>>>
 ! CHECK:           } else {
 ! CHECK:             %[[VAL_9:.*]] = arith.constant 0 : index
@@ -103,7 +105,9 @@ program reduce15
 ! CHECK:           %[[VAL_6:.*]] = arith.constant 0 : i64
 ! CHECK:           %[[VAL_7:.*]] = arith.cmpi eq, %[[VAL_5]], %[[VAL_6]] : i64
 ! CHECK:           fir.if %[[VAL_7]] {
-! CHECK:             %[[VAL_8:.*]] = fir.embox %[[VAL_4]] : (!fir.heap<!fir.array<?xi32>>) -> !fir.box<!fir.heap<!fir.array<?xi32>>>
+! CHECK:             %[[C0:.*]] = arith.constant 0 : index
+! CHECK:             %[[SHAPE:.*]] = fir.shape %[[C0]]
+! CHECK:             %[[VAL_8:.*]] = fir.embox %[[VAL_4]](%[[SHAPE]]) : (!fir.heap<!fir.array<?xi32>>, !fir.shape<1>) -> !fir.box<!fir.heap<!fir.array<?xi32>>>
 ! CHECK:             fir.store %[[VAL_8]] to %[[ALLOC]] : !fir.ref<!fir.box<!fir.heap<!fir.array<?xi32>>>>
 ! CHECK:           } else {
 ! CHECK:             %[[VAL_9:.*]] = arith.constant 0 : index

>From 0015bb3b817f557914a610659a55569b5eb14430 Mon Sep 17 00:00:00 2001
From: Tom Eccles <tom.eccles at arm.com>
Date: Fri, 13 Dec 2024 17:12:15 +0000
Subject: [PATCH 07/12] Support derived types

There are three cases handled:
- Boxed (maybe allocatable or pointer) scalar derived types
- Boxed (maybe allocatable or pointer) arrays of derived types
- Unboxed scalar derived types

Currently I support both boxed and unboxed derived types because unboxed
derived types aren't hlfir::Entities so I worry there could be cases
where they are in fact boxed.

The changes to parallel-private-clause.f90 are because of the arrays
becoming boxed. I re-organised the test a bit because the CHECK-DAGs
were matching completely the wrong lines outside of the parallel region.
---
 .../lib/Lower/OpenMP/DataSharingProcessor.cpp |  12 +--
 .../Lower/OpenMP/PrivateReductionUtils.cpp    | 100 ++++++++++++++----
 .../lib/Lower/OpenMP/PrivateReductionUtils.h  |   9 +-
 flang/test/Integration/OpenMP/copyprivate.f90 |  18 ++--
 .../Lower/OpenMP/default-clause-byref.f90     |  42 ++------
 .../delayed-privatization-default-init.f90    |  19 ++--
 .../Lower/OpenMP/firstprivate-alloc-comp.f90  |   2 +-
 .../Lower/OpenMP/parallel-private-clause.f90  |  33 +++---
 8 files changed, 129 insertions(+), 106 deletions(-)

diff --git a/flang/lib/Lower/OpenMP/DataSharingProcessor.cpp b/flang/lib/Lower/OpenMP/DataSharingProcessor.cpp
index fce2b7fbd4cd25..b48b115b7a67cd 100644
--- a/flang/lib/Lower/OpenMP/DataSharingProcessor.cpp
+++ b/flang/lib/Lower/OpenMP/DataSharingProcessor.cpp
@@ -20,6 +20,7 @@
 #include "flang/Optimizer/Builder/BoxValue.h"
 #include "flang/Optimizer/Builder/HLFIRTools.h"
 #include "flang/Optimizer/Builder/Todo.h"
+#include "flang/Optimizer/HLFIR/HLFIRDialect.h"
 #include "flang/Optimizer/HLFIR/HLFIROps.h"
 #include "flang/Semantics/attr.h"
 #include "flang/Semantics/tools.h"
@@ -565,7 +566,8 @@ void DataSharingProcessor::doPrivatize(const semantics::Symbol *sym,
 
     // Populate the `init` region.
     const bool needsInitialization =
-        Fortran::lower::hasDefaultInitialization(sym->GetUltimate()) ||
+        (Fortran::lower::hasDefaultInitialization(sym->GetUltimate()) &&
+         (!isFirstPrivate || hlfir::mayHaveAllocatableComponent(allocType))) ||
         mlir::isa<fir::BaseBoxType>(allocType) ||
         mlir::isa<fir::BoxCharType>(allocType);
     if (needsInitialization) {
@@ -573,16 +575,10 @@ void DataSharingProcessor::doPrivatize(const semantics::Symbol *sym,
       mlir::Block *initBlock = firOpBuilder.createBlock(
           &initRegion, /*insertPt=*/{}, {argType, argType}, {symLoc, symLoc});
 
-      if (fir::isa_derived(allocType))
-        TODO(symLoc, "Privatization init of derived types");
-      if (Fortran::lower::hasDefaultInitialization(sym->GetUltimate()))
-        TODO(symLoc,
-             "Privatization init of symbol with default initialization");
-
       populateByRefInitAndCleanupRegions(
           firOpBuilder, symLoc, argType, /*scalarInitValue=*/nullptr, initBlock,
           result.getInitPrivateArg(), result.getInitMoldArg(),
-          result.getDeallocRegion(), /*isPrivate=*/true);
+          result.getDeallocRegion(), /*isPrivate=*/true, sym);
     }
 
     // Populate the `copy` region if this is a `firstprivate`.
diff --git a/flang/lib/Lower/OpenMP/PrivateReductionUtils.cpp b/flang/lib/Lower/OpenMP/PrivateReductionUtils.cpp
index 41c70debed3b4d..85ff5dcee0990c 100644
--- a/flang/lib/Lower/OpenMP/PrivateReductionUtils.cpp
+++ b/flang/lib/Lower/OpenMP/PrivateReductionUtils.cpp
@@ -12,21 +12,34 @@
 
 #include "PrivateReductionUtils.h"
 
+#include "flang/Lower/ConvertVariable.h"
 #include "flang/Optimizer/Builder/BoxValue.h"
 #include "flang/Optimizer/Builder/Character.h"
 #include "flang/Optimizer/Builder/FIRBuilder.h"
 #include "flang/Optimizer/Builder/HLFIRTools.h"
+#include "flang/Optimizer/Builder/Runtime/Derived.h"
 #include "flang/Optimizer/Builder/Todo.h"
 #include "flang/Optimizer/Dialect/FIROps.h"
 #include "flang/Optimizer/Dialect/FIRType.h"
+#include "flang/Optimizer/HLFIR/HLFIRDialect.h"
 #include "flang/Optimizer/HLFIR/HLFIROps.h"
 #include "flang/Optimizer/Support/FatalError.h"
+#include "flang/Semantics/symbol.h"
 #include "mlir/Dialect/OpenMP/OpenMPDialect.h"
 #include "mlir/IR/Location.h"
 
+static bool hasFinalization(const Fortran::semantics::Symbol &sym) {
+  if (sym.has<Fortran::semantics::ObjectEntityDetails>())
+    if (const Fortran::semantics::DeclTypeSpec *declTypeSpec = sym.GetType())
+      if (const Fortran::semantics::DerivedTypeSpec *derivedTypeSpec =
+              declTypeSpec->AsDerived())
+        return Fortran::semantics::IsFinalizable(*derivedTypeSpec);
+  return false;
+}
+
 static void createCleanupRegion(fir::FirOpBuilder &builder, mlir::Location loc,
-                                mlir::Type argType,
-                                mlir::Region &cleanupRegion) {
+                                mlir::Type argType, mlir::Region &cleanupRegion,
+                                const Fortran::semantics::Symbol *sym) {
   assert(cleanupRegion.empty());
   mlir::Block *block = builder.createBlock(&cleanupRegion, cleanupRegion.end(),
                                            {argType}, {loc});
@@ -41,12 +54,6 @@ static void createCleanupRegion(fir::FirOpBuilder &builder, mlir::Location loc,
 
   mlir::Type valTy = fir::unwrapRefType(argType);
   if (auto boxTy = mlir::dyn_cast_or_null<fir::BaseBoxType>(valTy)) {
-    if (!mlir::isa<fir::HeapType, fir::PointerType>(boxTy.getEleTy())) {
-      mlir::Type innerTy = fir::extractSequenceType(boxTy);
-      if (!mlir::isa<fir::SequenceType>(innerTy))
-        typeError();
-    }
-
     mlir::Value arg = builder.loadIfRef(loc, block->getArgument(0));
     assert(mlir::isa<fir::BaseBoxType>(arg.getType()));
 
@@ -138,13 +145,20 @@ void Fortran::lower::omp::populateByRefInitAndCleanupRegions(
     fir::FirOpBuilder &builder, mlir::Location loc, mlir::Type argType,
     mlir::Value scalarInitValue, mlir::Block *initBlock,
     mlir::Value allocatedPrivVarArg, mlir::Value moldArg,
-    mlir::Region &cleanupRegion, bool isPrivate) {
+    mlir::Region &cleanupRegion, bool isPrivate,
+    const Fortran::semantics::Symbol *sym) {
   mlir::Type ty = fir::unwrapRefType(argType);
   builder.setInsertionPointToEnd(initBlock);
   auto yield = [&](mlir::Value ret) {
     builder.create<mlir::omp::YieldOp>(loc, ret);
   };
 
+  if (isPrivate)
+    assert(sym && "Symbol information is needed to privatize derived types");
+  bool needsInitialization =
+      sym ? Fortran::lower::hasDefaultInitialization(sym->GetUltimate())
+          : false;
+
   if (fir::isa_trivial(ty)) {
     builder.setInsertionPointToEnd(initBlock);
 
@@ -214,19 +228,30 @@ void Fortran::lower::omp::populateByRefInitAndCleanupRegions(
     }
 
     moldArg = builder.loadIfRef(loc, moldArg);
-    hlfir::genLengthParameters(loc, builder, hlfir::Entity{moldArg}, lenParams);
+    // We pass derived types unboxed and so are not self-contained entities.
+    if (hlfir::isFortranEntity(moldArg))
+      hlfir::genLengthParameters(loc, builder, hlfir::Entity{moldArg},
+                                 lenParams);
 
     mlir::Type innerTy = fir::unwrapRefType(boxTy.getEleTy());
+    bool isDerived = fir::isa_derived(innerTy);
     bool isChar = fir::isa_char(innerTy);
-    if (fir::isa_trivial(innerTy) || isChar) {
+    if (fir::isa_trivial(innerTy) || isDerived || isChar) {
       // boxed non-sequence value e.g. !fir.box<!fir.heap<i32>>
-      if (!isAllocatableOrPointer)
-        TODO(loc,
-             "Reduction/Privatization of non-allocatable trivial typed box");
+      if (!isAllocatableOrPointer && !isDerived)
+        TODO(loc, "Reduction/Privatization of non-allocatable trivial or "
+                  "character typed box");
 
-      fir::IfOp ifUnallocated = handleNullAllocatable(boxAlloca, moldArg);
+      if ((isDerived || isChar) && (!isPrivate || scalarInitValue))
+        TODO(loc, "Reduction of an unsupported boxed type");
+
+      fir::IfOp ifUnallocated{nullptr};
+      if (isAllocatableOrPointer) {
+        ifUnallocated = handleNullAllocatable(boxAlloca, moldArg);
+        builder.setInsertionPointToStart(
+            &ifUnallocated.getElseRegion().front());
+      }
 
-      builder.setInsertionPointToStart(&ifUnallocated.getElseRegion().front());
       mlir::Value valAlloc = builder.createHeapTemporary(
           loc, innerTy, /*name=*/{}, /*shape=*/{}, lenParams);
       if (scalarInitValue)
@@ -234,19 +259,31 @@ void Fortran::lower::omp::populateByRefInitAndCleanupRegions(
       mlir::Value box = builder.create<fir::EmboxOp>(
           loc, ty, valAlloc, /*shape=*/mlir::Value{}, /*slice=*/mlir::Value{},
           lenParams);
-      builder.create<fir::StoreOp>(loc, box, boxAlloca);
+      if (needsInitialization)
+        fir::runtime::genDerivedTypeInitialize(builder, loc, box);
+      fir::StoreOp lastOp = builder.create<fir::StoreOp>(loc, box, boxAlloca);
 
-      createCleanupRegion(builder, loc, argType, cleanupRegion);
-      builder.setInsertionPointAfter(ifUnallocated);
+      createCleanupRegion(builder, loc, argType, cleanupRegion, sym);
+
+      if (ifUnallocated)
+        builder.setInsertionPointAfter(ifUnallocated);
+      else
+        builder.setInsertionPointAfter(lastOp);
       yield(boxAlloca);
       return;
     }
+
     innerTy = fir::extractSequenceType(boxTy);
     if (!innerTy || !mlir::isa<fir::SequenceType>(innerTy))
       TODO(loc, "Unsupported boxed type for reduction/privatization");
 
     moldArg = builder.loadIfRef(loc, moldArg);
-    hlfir::genLengthParameters(loc, builder, hlfir::Entity{moldArg}, lenParams);
+    // We pass derived types unboxed and so are not self-contained entities.
+    // Assume that if length parameters are required, they will be boxed by
+    // lowering.
+    if (hlfir::isFortranEntity(moldArg))
+      hlfir::genLengthParameters(loc, builder, hlfir::Entity{moldArg},
+                                 lenParams);
 
     fir::IfOp ifUnallocated{nullptr};
     if (isAllocatableOrPointer) {
@@ -274,7 +311,7 @@ void Fortran::lower::omp::populateByRefInitAndCleanupRegions(
            "createTempFromMold decides this statically");
     if (cstNeedsDealloc.has_value() && *cstNeedsDealloc != false) {
       mlir::OpBuilder::InsertionGuard guard(builder);
-      createCleanupRegion(builder, loc, argType, cleanupRegion);
+      createCleanupRegion(builder, loc, argType, cleanupRegion, sym);
     } else {
       assert(!isAllocatableOrPointer &&
              "Pointer-like arrays must be heap allocated");
@@ -298,6 +335,9 @@ void Fortran::lower::omp::populateByRefInitAndCleanupRegions(
 
     if (scalarInitValue)
       builder.create<hlfir::AssignOp>(loc, scalarInitValue, box);
+    if (needsInitialization)
+      fir::runtime::genDerivedTypeInitialize(builder, loc, box);
+
     builder.create<fir::StoreOp>(loc, box, boxAlloca);
     if (ifUnallocated)
       builder.setInsertionPointAfter(ifUnallocated);
@@ -323,13 +363,29 @@ void Fortran::lower::omp::populateByRefInitAndCleanupRegions(
         loc, eleTy, /*name=*/{}, /*shape=*/{}, /*lenParams=*/len);
     mlir::Value boxChar = charExprHelper.createEmboxChar(privateAddr, len);
 
-    createCleanupRegion(builder, loc, argType, cleanupRegion);
+    createCleanupRegion(builder, loc, argType, cleanupRegion, sym);
 
     builder.setInsertionPointToEnd(initBlock);
     yield(boxChar);
     return;
   }
 
+  if (fir::isa_derived(ty)) {
+    if (needsInitialization) {
+      builder.setInsertionPointToStart(initBlock);
+      mlir::Type boxedTy = fir::BoxType::get(ty);
+      mlir::Value box =
+          builder.create<fir::EmboxOp>(loc, boxedTy, allocatedPrivVarArg);
+      fir::runtime::genDerivedTypeInitialize(builder, loc, box);
+    }
+    if (sym && hasFinalization(*sym))
+      createCleanupRegion(builder, loc, argType, cleanupRegion, sym);
+
+    builder.setInsertionPointToEnd(initBlock);
+    yield(allocatedPrivVarArg);
+    return;
+  }
+
   TODO(loc,
        "creating reduction/privatization init region for unsupported type");
   return;
diff --git a/flang/lib/Lower/OpenMP/PrivateReductionUtils.h b/flang/lib/Lower/OpenMP/PrivateReductionUtils.h
index b81b00e1784789..b8d0db54ded73c 100644
--- a/flang/lib/Lower/OpenMP/PrivateReductionUtils.h
+++ b/flang/lib/Lower/OpenMP/PrivateReductionUtils.h
@@ -20,6 +20,12 @@ namespace mlir {
 class Region;
 } // namespace mlir
 
+namespace Fortran {
+namespace semantics {
+class Symbol;
+} // namespace semantics
+} // namespace Fortran
+
 namespace fir {
 class FirOpBuilder;
 class ShapeShiftOp;
@@ -37,7 +43,8 @@ void populateByRefInitAndCleanupRegions(
     fir::FirOpBuilder &builder, mlir::Location loc, mlir::Type argType,
     mlir::Value scalarInitValue, mlir::Block *initBlock,
     mlir::Value allocatedPrivVarArg, mlir::Value moldArg,
-    mlir::Region &cleanupRegion, bool isPrivate = false);
+    mlir::Region &cleanupRegion, bool isPrivate = false,
+    const Fortran::semantics::Symbol *sym = nullptr);
 
 /// Generate a fir::ShapeShift op describing the provided boxed array.
 fir::ShapeShiftOp getShapeShift(fir::FirOpBuilder &builder, mlir::Location loc,
diff --git a/flang/test/Integration/OpenMP/copyprivate.f90 b/flang/test/Integration/OpenMP/copyprivate.f90
index d38fd20020f34c..e3732c487a0e2e 100644
--- a/flang/test/Integration/OpenMP/copyprivate.f90
+++ b/flang/test/Integration/OpenMP/copyprivate.f90
@@ -9,17 +9,17 @@
 !RUN: %flang_fc1 -emit-llvm -fopenmp %s -o - | FileCheck %s
 
 !CHECK-DAG: define internal void @_copy_box_Uxi32(ptr nocapture %{{.*}}, ptr nocapture %{{.*}})
-!CHECK-DAG: define internal void @_copy_10xi32(ptr nocapture %{{.*}}, ptr nocapture %{{.*}})
+!CHECK-DAG: define internal void @_copy_box_10xi32(ptr nocapture %{{.*}}, ptr nocapture %{{.*}})
 !CHECK-DAG: define internal void @_copy_i64(ptr nocapture %{{.*}}, ptr nocapture %{{.*}})
 !CHECK-DAG: define internal void @_copy_box_Uxi64(ptr nocapture %{{.*}}, ptr nocapture %{{.*}})
 !CHECK-DAG: define internal void @_copy_f32(ptr nocapture %{{.*}}, ptr nocapture %{{.*}})
-!CHECK-DAG: define internal void @_copy_2x3xf32(ptr nocapture %{{.*}}, ptr nocapture %{{.*}})
+!CHECK-DAG: define internal void @_copy_box_2x3xf32(ptr nocapture %{{.*}}, ptr nocapture %{{.*}})
 !CHECK-DAG: define internal void @_copy_z32(ptr nocapture %{{.*}}, ptr nocapture %{{.*}})
-!CHECK-DAG: define internal void @_copy_10xz32(ptr nocapture %{{.*}}, ptr nocapture %{{.*}})
+!CHECK-DAG: define internal void @_copy_box_10xz32(ptr nocapture %{{.*}}, ptr nocapture %{{.*}})
 !CHECK-DAG: define internal void @_copy_l32(ptr nocapture %{{.*}}, ptr nocapture %{{.*}})
-!CHECK-DAG: define internal void @_copy_5xl32(ptr nocapture %{{.*}}, ptr nocapture %{{.*}})
+!CHECK-DAG: define internal void @_copy_box_5xl32(ptr nocapture %{{.*}}, ptr nocapture %{{.*}})
 !CHECK-DAG: define internal void @_copy_c8x8(ptr nocapture %{{.*}}, ptr nocapture %{{.*}})
-!CHECK-DAG: define internal void @_copy_10xc8x8(ptr nocapture %{{.*}}, ptr nocapture %{{.*}})
+!CHECK-DAG: define internal void @_copy_box_10xc8x8(ptr nocapture %{{.*}}, ptr nocapture %{{.*}})
 !CHECK-DAG: define internal void @_copy_c16x5(ptr nocapture %{{.*}}, ptr nocapture %{{.*}})
 !CHECK-DAG: define internal void @_copy_rec__QFtest_typesTdt(ptr nocapture %{{.*}}, ptr nocapture %{{.*}})
 !CHECK-DAG: define internal void @_copy_box_heap_Uxi32(ptr nocapture %{{.*}}, ptr nocapture %{{.*}})
@@ -33,8 +33,12 @@
 !CHECK-NEXT:  }
 
 !CHECK-LABEL: define internal void @test_scalar_..omp_par({{.*}})
-!CHECK:         %[[I:.*]] = alloca i32, i64 1
-!CHECK:         %[[J:.*]] = alloca i32, i64 1
+!CHECK-NEXT: omp.par.entry:
+!CHECK:         %[[TID_ADDR:.*]] = alloca i32, align 4
+!CHECK:         %[[I:.*]] = alloca i32, align 4
+!CHECK:         %[[J:.*]] = alloca i32, align 4
+!CHECK:         br label %[[OMP_REDUCTION_INIT:.*]]
+
 !CHECK:         %[[DID_IT:.*]] = alloca i32
 !CHECK:         store i32 0, ptr %[[DID_IT]]
 !CHECK:         %[[THREAD_NUM1:.*]] = call i32 @__kmpc_global_thread_num(ptr @[[LOC:.*]])
diff --git a/flang/test/Lower/OpenMP/default-clause-byref.f90 b/flang/test/Lower/OpenMP/default-clause-byref.f90
index 654c13ada9e39f..168aa1f5394aa8 100644
--- a/flang/test/Lower/OpenMP/default-clause-byref.f90
+++ b/flang/test/Lower/OpenMP/default-clause-byref.f90
@@ -7,57 +7,27 @@
 ! RUN: bbc -fopenmp -emit-hlfir --force-byref-reduction %s -o - \
 ! RUN: | FileCheck %s
 
-!CHECK:  omp.private {type = firstprivate} @[[W_FIRSTPRIVATIZER:_QFEw_firstprivate_ref_i32]] : !fir.ref<i32> alloc {
-!CHECK:  ^bb0(%{{.*}}: !fir.ref<i32>):
-!CHECK:    %[[PRIV_W_ALLOC:.*]] = fir.alloca i32 {bindc_name = "w", {{.*}}}
-!CHECK:    %[[PRIV_W_DECL:.*]]:2 = hlfir.declare %[[PRIV_W_ALLOC]] {uniq_name = "_QFEw"}
-!CHECK:    omp.yield(%[[PRIV_W_DECL]]#0 : !fir.ref<i32>)
-!CHECK:  } copy {
+!CHECK:  omp.private {type = firstprivate} @[[W_FIRSTPRIVATIZER:_QFEw_firstprivate_i32]] : i32 copy {
 !CHECK:  ^bb0(%[[ORIG_W:.*]]: !fir.ref<i32>, %[[PRIV_W:.*]]: !fir.ref<i32>):
 !CHECK:    %[[ORIG_W_VAL:.*]] = fir.load %[[ORIG_W]]
 !CHECK:    hlfir.assign %[[ORIG_W_VAL]] to %[[PRIV_W]]
 !CHECK:    omp.yield(%[[PRIV_W]] : !fir.ref<i32>)
 !CHECK:  }
 
-!CHECK:  omp.private {type = firstprivate} @[[Y_FIRSTPRIVATIZER:_QFEy_firstprivate_ref_i32]] : !fir.ref<i32> alloc {
-!CHECK:  ^bb0(%{{.*}}: !fir.ref<i32>):
-!CHECK:    %[[PRIV_Y_ALLOC:.*]] = fir.alloca i32 {bindc_name = "y", {{.*}}}
-!CHECK:    %[[PRIV_Y_DECL:.*]]:2 = hlfir.declare %[[PRIV_Y_ALLOC]] {uniq_name = "_QFEy"}
-!CHECK:    omp.yield(%[[PRIV_Y_DECL]]#0 : !fir.ref<i32>)
-!CHECK:  } copy {
+!CHECK:  omp.private {type = firstprivate} @[[Y_FIRSTPRIVATIZER:_QFEy_firstprivate_i32]] : i32 copy {
 !CHECK:  ^bb0(%[[ORIG_Y:.*]]: !fir.ref<i32>, %[[PRIV_Y:.*]]: !fir.ref<i32>):
 !CHECK:    %[[ORIG_Y_VAL:.*]] = fir.load %[[ORIG_Y]]
 !CHECK:    hlfir.assign %[[ORIG_Y_VAL]] to %[[PRIV_Y]]
 !CHECK:    omp.yield(%[[PRIV_Y]] : !fir.ref<i32>)
 !CHECK:  }
 
-!CHECK:  omp.private {type = private} @[[X_PRIVATIZER:_QFEx_private_ref_i32]] : !fir.ref<i32> alloc {
-!CHECK:  ^bb0(%{{.*}}: !fir.ref<i32>):
-!CHECK:    %[[PRIV_X_ALLOC:.*]] = fir.alloca i32 {bindc_name = "x", {{.*}}}
-!CHECK:    %[[PRIV_X_DECL:.*]]:2 = hlfir.declare %[[PRIV_X_ALLOC]] {uniq_name = "_QFEx"}
-!CHECK:    omp.yield(%[[PRIV_X_DECL]]#0 : !fir.ref<i32>)
-!CHECK:  }
+!CHECK:  omp.private {type = private} @[[X_PRIVATIZER:_QFEx_private_i32]] : i32
 
-!CHECK:  omp.private {type = private} @[[W_PRIVATIZER:_QFEw_private_ref_i32]] : !fir.ref<i32> alloc {
-!CHECK:  ^bb0(%{{.*}}: !fir.ref<i32>):
-!CHECK:    %[[PRIV_W_ALLOC:.*]] = fir.alloca i32 {bindc_name = "w", {{.*}}}
-!CHECK:    %[[PRIV_W_DECL:.*]]:2 = hlfir.declare %[[PRIV_W_ALLOC]] {uniq_name = "_QFEw"}
-!CHECK:    omp.yield(%[[PRIV_W_DECL]]#0 : !fir.ref<i32>)
-!CHECK:  }
+!CHECK:  omp.private {type = private} @[[W_PRIVATIZER:_QFEw_private_i32]] : i32
 
-!CHECK:  omp.private {type = private} @[[Y_PRIVATIZER:_QFEy_private_ref_i32]] : !fir.ref<i32> alloc {
-!CHECK:  ^bb0(%{{.*}}: !fir.ref<i32>):
-!CHECK:    %[[PRIV_Y_ALLOC:.*]] = fir.alloca i32 {bindc_name = "y", {{.*}}}
-!CHECK:    %[[PRIV_Y_DECL:.*]]:2 = hlfir.declare %[[PRIV_Y_ALLOC]] {uniq_name = "_QFEy"}
-!CHECK:    omp.yield(%[[PRIV_Y_DECL]]#0 : !fir.ref<i32>)
-!CHECK:  }
+!CHECK:  omp.private {type = private} @[[Y_PRIVATIZER:_QFEy_private_i32]] : i32
 
-!CHECK:  omp.private {type = firstprivate} @[[X_FIRSTPRIVATIZER:_QFEx_firstprivate_ref_i32]] : !fir.ref<i32> alloc {
-!CHECK:  ^bb0(%{{.*}}: !fir.ref<i32>):
-!CHECK:    %[[PRIV_X_ALLOC:.*]] = fir.alloca i32 {bindc_name = "x", {{.*}}}
-!CHECK:    %[[PRIV_X_DECL:.*]]:2 = hlfir.declare %[[PRIV_X_ALLOC]] {uniq_name = "_QFEx"}
-!CHECK:    omp.yield(%[[PRIV_X_DECL]]#0 : !fir.ref<i32>)
-!CHECK:  } copy {
+!CHECK:  omp.private {type = firstprivate} @[[X_FIRSTPRIVATIZER:_QFEx_firstprivate_i32]] : i32 copy {
 !CHECK:  ^bb0(%[[ORIG_X:.*]]: !fir.ref<i32>, %[[PRIV_X:.*]]: !fir.ref<i32>):
 !CHECK:    %[[ORIG_X_VAL:.*]] = fir.load %[[ORIG_X]]
 !CHECK:    hlfir.assign %[[ORIG_X_VAL]] to %[[PRIV_X]]
diff --git a/flang/test/Lower/OpenMP/delayed-privatization-default-init.f90 b/flang/test/Lower/OpenMP/delayed-privatization-default-init.f90
index 022b592db74b81..87d4605217a8a0 100644
--- a/flang/test/Lower/OpenMP/delayed-privatization-default-init.f90
+++ b/flang/test/Lower/OpenMP/delayed-privatization-default-init.f90
@@ -29,19 +29,16 @@ subroutine delayed_privatization_default_init_firstprivate
   !$omp end parallel
 end subroutine
 
-! CHECK-LABEL:   omp.private {type = firstprivate} @_QFdelayed_privatization_default_init_firstprivateEa_firstprivate_ref_rec__QFdelayed_privatization_default_init_firstprivateTt : !fir.ref<!fir.type<_QFdelayed_privatization_default_init_firstprivateTt{i:i32}>> alloc {
-! CHECK:         ^bb0(%[[VAL_0:.*]]: !fir.ref<!fir.type<_QFdelayed_privatization_default_init_firstprivateTt{i:i32}>>):
-! CHECK:           %[[VAL_1:.*]] = fir.alloca !fir.type<_QFdelayed_privatization_default_init_firstprivateTt{i:i32}> {bindc_name = "a", pinned, uniq_name = "_QFdelayed_privatization_default_init_firstprivateEa"}
-! CHECK-NEXT:      %[[VAL_9:.*]]:2 = hlfir.declare %[[VAL_1]] {uniq_name = "_QFdelayed_privatization_default_init_firstprivateEa"} : (!fir.ref<!fir.type<_QFdelayed_privatization_default_init_firstprivateTt{i:i32}>>) -> (!fir.ref<!fir.type<_QFdelayed_privatization_default_init_firstprivateTt{i:i32}>>, !fir.ref<!fir.type<_QFdelayed_privatization_default_init_firstprivateTt{i:i32}>>)
-! CHECK:           omp.yield(%[[VAL_9]]#0 : !fir.ref<!fir.type<_QFdelayed_privatization_default_init_firstprivateTt{i:i32}>>)
-! CHECK:   }
+! CHECK-LABEL:   omp.private {type = firstprivate}
+! CHECK-SAME:        @_QFdelayed_privatization_default_init_firstprivateEa_firstprivate_rec__QFdelayed_privatization_default_init_firstprivateTt :
+! CHECK-SAME:        [[TYPE:!fir.type<_QFdelayed_privatization_default_init_firstprivateTt{i:i32}>]] copy {
 
-! CHECK-LABEL:   omp.private {type = private} @_QFdelayed_privatization_default_initEa_private_ref_rec__QFdelayed_privatization_default_initTt : !fir.ref<!fir.type<_QFdelayed_privatization_default_initTt{i:i32}>> alloc {
-! CHECK:         ^bb0(%[[VAL_0:.*]]: !fir.ref<!fir.type<_QFdelayed_privatization_default_initTt{i:i32}>>):
-! CHECK:           %[[VAL_1:.*]] = fir.alloca !fir.type<_QFdelayed_privatization_default_initTt{i:i32}> {bindc_name = "a", pinned, uniq_name = "_QFdelayed_privatization_default_initEa"}
+! CHECK-LABEL:   omp.private {type = private}
+! CHECK-SAME:        @_QFdelayed_privatization_default_initEa_private_rec__QFdelayed_privatization_default_initTt :
+! CHECK-SAME:        [[TYPE:!fir.type<_QFdelayed_privatization_default_initTt{i:i32}>]] init {
+! CHECK:         ^bb0(%[[VAL_0:.*]]: !fir.ref<[[TYPE]]>, %[[VAL_1:.*]]: !fir.ref<[[TYPE]]>):
 ! CHECK:           %[[VAL_2:.*]] = fir.embox %[[VAL_1]] : (!fir.ref<!fir.type<_QFdelayed_privatization_default_initTt{i:i32}>>) -> !fir.box<!fir.type<_QFdelayed_privatization_default_initTt{i:i32}>>
 ! CHECK:           %[[VAL_6:.*]] = fir.convert %[[VAL_2]] : (!fir.box<!fir.type<_QFdelayed_privatization_default_initTt{i:i32}>>) -> !fir.box<none>
 ! CHECK:           fir.call @_FortranAInitialize(%[[VAL_6]],{{.*}}
-! CHECK-NEXT:      %[[VAL_9:.*]]:2 = hlfir.declare %[[VAL_1]] {uniq_name = "_QFdelayed_privatization_default_initEa"} : (!fir.ref<!fir.type<_QFdelayed_privatization_default_initTt{i:i32}>>) -> (!fir.ref<!fir.type<_QFdelayed_privatization_default_initTt{i:i32}>>, !fir.ref<!fir.type<_QFdelayed_privatization_default_initTt{i:i32}>>)
-! CHECK:           omp.yield(%[[VAL_9]]#0 : !fir.ref<!fir.type<_QFdelayed_privatization_default_initTt{i:i32}>>)
+! CHECK:           omp.yield(%[[VAL_1]] : !fir.ref<!fir.type<_QFdelayed_privatization_default_initTt{i:i32}>>)
 ! CHECK:   }
diff --git a/flang/test/Lower/OpenMP/firstprivate-alloc-comp.f90 b/flang/test/Lower/OpenMP/firstprivate-alloc-comp.f90
index 2453fe2c5208bc..4d0a2a0b90243c 100644
--- a/flang/test/Lower/OpenMP/firstprivate-alloc-comp.f90
+++ b/flang/test/Lower/OpenMP/firstprivate-alloc-comp.f90
@@ -13,7 +13,7 @@ subroutine firstprivate_alloc_comp
 
   call firstprivate_alloc_comp()
 end
-! CHECK-LABEL:   omp.private {type = firstprivate} @_QFfirstprivate_alloc_compEx_firstprivate_ref_rec__QFfirstprivate_alloc_compTt1 : !fir.ref<!fir.type<_QFfirstprivate_alloc_compTt1{c:!fir.box<!fir.heap<!fir.array<?xi32>>>}>> alloc {
+! CHECK-LABEL:   omp.private {type = firstprivate} @_QFfirstprivate_alloc_compEx_firstprivate_rec__QFfirstprivate_alloc_compTt1 : !fir.type<_QFfirstprivate_alloc_compTt1{c:!fir.box<!fir.heap<!fir.array<?xi32>>>}> init {
 ! CHECK:     fir.call @_FortranAInitialize(
 ! CHECK:   } copy {
 ! ...
diff --git a/flang/test/Lower/OpenMP/parallel-private-clause.f90 b/flang/test/Lower/OpenMP/parallel-private-clause.f90
index 7114314df05d3a..3ed2efb2b5922c 100644
--- a/flang/test/Lower/OpenMP/parallel-private-clause.f90
+++ b/flang/test/Lower/OpenMP/parallel-private-clause.f90
@@ -5,12 +5,10 @@
 ! RUN: bbc --use-desc-for-alloc=false -fopenmp -emit-hlfir %s -o - \
 ! RUN: | FileCheck %s --check-prefix=FIRDialect
 
-! FIRDialect: omp.private {type = private} @_QFsimd_loop_1Er_private_ref_box_heap_f32 {{.*}} alloc {
-! FIRDialect:     [[R:%.*]] = fir.alloca !fir.box<!fir.heap<f32>> {bindc_name = "r", pinned, uniq_name = "{{.*}}Er"}
+! FIRDialect: omp.private {type = private} @_QFsimd_loop_1Er_private_box_heap_f32 : !fir.box<!fir.heap<f32>> init {
+! FIRDialect:     fir.store {{%.*}} to [[R:.*]] : !fir.ref<!fir.box<!fir.heap<f32>>>
 ! FIRDialect:     fir.store {{%.*}} to [[R]] : !fir.ref<!fir.box<!fir.heap<f32>>>
-! FIRDialect:     fir.store {{%.*}} to [[R]] : !fir.ref<!fir.box<!fir.heap<f32>>>
-! FIRDialect:     [[R_DECL:%.*]]:2 = hlfir.declare [[R]] {fortran_attrs = #fir.var_attrs<allocatable>, uniq_name = "{{.*}}r"} : (!fir.ref<!fir.box<!fir.heap<f32>>>) -> (!fir.ref<!fir.box<!fir.heap<f32>>>, !fir.ref<!fir.box<!fir.heap<f32>>>)
-! FIRDialect:     omp.yield([[R_DECL]]#0 : !fir.ref<!fir.box<!fir.heap<f32>>>)
+! FIRDialect:     omp.yield([[R]] : !fir.ref<!fir.box<!fir.heap<f32>>>)
 ! FIRDialect:   } dealloc {
 ! FIRDialect:  ^bb0([[R_DECL:%.*]]: !fir.ref<!fir.box<!fir.heap<f32>>>):
 ! FIRDialect:     {{%.*}} = fir.load [[R_DECL]] : !fir.ref<!fir.box<!fir.heap<f32>>>
@@ -18,36 +16,31 @@
 ! FIRDialect:     [[LD:%.*]] = fir.load [[R_DECL]] : !fir.ref<!fir.box<!fir.heap<f32>>>
 ! FIRDialect:     [[AD:%.*]] = fir.box_addr [[LD]] : (!fir.box<!fir.heap<f32>>) -> !fir.heap<f32>
 ! FIRDialect:     fir.freemem [[AD]] : !fir.heap<f32>
-! FIRDialect:     fir.store {{%.*}} to [[R_DECL]] : !fir.ref<!fir.box<!fir.heap<f32>>>
 ! FIRDialect:     omp.yield
 ! FIRDialect:   }
 
-!FIRDialect: omp.private {type = private} @[[DERIVED_PRIVATIZER:_QFprivate_clause_derived_typeEt_private_ref_rec__QFprivate_clause_derived_typeTmy_type]] : !fir.ref<!fir.type<_QFprivate_clause_derived_typeTmy_type{t_i:i32,t_arr:!fir.array<5xi32>}>> alloc {
-!FIRDialect:   ^bb0(%{{.*}}: !fir.ref<!fir.type<_QFprivate_clause_derived_typeTmy_type{t_i:i32,t_arr:!fir.array<5xi32>}>>):
-!FIRDialect:     %[[PRIV_ALLOC:.*]] = fir.alloca !fir.type<_QFprivate_clause_derived_typeTmy_type{t_i:i32,t_arr:!fir.array<5xi32>}> {bindc_name = "t", pinned, uniq_name = "_QFprivate_clause_derived_typeEt"}
-!FIRDialect:     %[[PRIV_DECL:.*]]:2 = hlfir.declare %[[PRIV_ALLOC]] {uniq_name = "_QFprivate_clause_derived_typeEt"} : (!fir.ref<!fir.type<_QFprivate_clause_derived_typeTmy_type{t_i:i32,t_arr:!fir.array<5xi32>}>>) -> (!fir.ref<!fir.type<_QFprivate_clause_derived_typeTmy_type{t_i:i32,t_arr:!fir.array<5xi32>}>>, !fir.ref<!fir.type<_QFprivate_clause_derived_typeTmy_type{t_i:i32,t_arr:!fir.array<5xi32>}>>)
-!FIRDialect:     omp.yield(%[[PRIV_DECL]]#0 : !fir.ref<!fir.type<_QFprivate_clause_derived_typeTmy_type{t_i:i32,t_arr:!fir.array<5xi32>}>>)
-!FIRDialect: }
+!FIRDialect: omp.private {type = private} @[[DERIVED_PRIVATIZER:_QFprivate_clause_derived_typeEt_private_rec__QFprivate_clause_derived_typeTmy_type]] : !fir.type<_QFprivate_clause_derived_typeTmy_type{t_i:i32,t_arr:!fir.array<5xi32>}>
 
 !FIRDialect: func @_QPprivate_clause(%[[ARG1:.*]]: !fir.ref<i32> {fir.bindc_name = "arg1"}, %[[ARG2:.*]]: !fir.ref<!fir.array<10xi32>> {fir.bindc_name = "arg2"}, %[[ARG3:.*]]: !fir.boxchar<1> {fir.bindc_name = "arg3"}, %[[ARG4:.*]]: !fir.boxchar<1> {fir.bindc_name = "arg4"}) {
 !FIRDialect-DAG: %[[ALPHA:.*]] = fir.alloca i32 {bindc_name = "alpha", uniq_name = "{{.*}}alpha"}
-!FIRDialect-DAG: %[[ALPHA_DECL:.*]]:2 = hlfir.declare %[[ALPHA]] {uniq_name = "{{.*}}alpha"} : (!fir.ref<i32>) -> (!fir.ref<i32>, !fir.ref<i32>)
 !FIRDialect-DAG: %[[ALPHA_ARRAY:.*]] = fir.alloca !fir.array<10xi32> {bindc_name = "alpha_array", uniq_name = "{{.*}}alpha_array"}
-!FIRDialect-DAG: %[[ALPHA_ARRAY_DECL:.*]]:2 = hlfir.declare %[[ALPHA_ARRAY]]({{.*}}) {uniq_name = "{{.*}}alpha_array"} : (!fir.ref<!fir.array<10xi32>>, !fir.shape<1>) -> (!fir.ref<!fir.array<10xi32>>, !fir.ref<!fir.array<10xi32>>)
 !FIRDialect-DAG: %[[BETA:.*]] = fir.alloca !fir.char<1,5> {bindc_name = "beta", uniq_name = "{{.*}}beta"}
-!FIRDialect-DAG: %[[BETA_DECL:.*]]:2 = hlfir.declare %[[BETA]] typeparams {{.*}} {uniq_name = "{{.*}}beta"} : (!fir.ref<!fir.char<1,5>>, index) -> (!fir.ref<!fir.char<1,5>>, !fir.ref<!fir.char<1,5>>)
 !FIRDialect-DAG: %[[BETA_ARRAY:.*]] = fir.alloca !fir.array<10x!fir.char<1,5>> {bindc_name = "beta_array", uniq_name = "{{.*}}beta_array"}
+
+!FIRDialect-DAG: %[[ALPHA_DECL:.*]]:2 = hlfir.declare %[[ALPHA]] {uniq_name = "{{.*}}alpha"} : (!fir.ref<i32>) -> (!fir.ref<i32>, !fir.ref<i32>)
+!FIRDialect-DAG: %[[ALPHA_ARRAY_DECL:.*]]:2 = hlfir.declare %[[ALPHA_ARRAY]]({{.*}}) {uniq_name = "{{.*}}alpha_array"} : (!fir.ref<!fir.array<10xi32>>, !fir.shape<1>) -> (!fir.ref<!fir.array<10xi32>>, !fir.ref<!fir.array<10xi32>>)
+!FIRDialect-DAG: %[[BETA_DECL:.*]]:2 = hlfir.declare %[[BETA]] typeparams {{.*}} {uniq_name = "{{.*}}beta"} : (!fir.ref<!fir.char<1,5>>, index) -> (!fir.ref<!fir.char<1,5>>, !fir.ref<!fir.char<1,5>>)
 !FIRDialect-DAG: %[[BETA_ARRAY_DECL:.*]]:2 = hlfir.declare %[[BETA_ARRAY]]({{.*}}) typeparams {{.*}} {uniq_name = "{{.*}}beta_array"} : (!fir.ref<!fir.array<10x!fir.char<1,5>>>, !fir.shape<1>, index) -> (!fir.ref<!fir.array<10x!fir.char<1,5>>>, !fir.ref<!fir.array<10x!fir.char<1,5>>>)
 
-!FIRDialect-DAG: omp.parallel private(@{{.*}} %{{.*}}#0 -> %[[ALPHA_PVT:.*]], @{{.*}} %{{.*}}#0 -> %[[ALPHA_ARRAY_PVT:.*]], @{{.*}} %{{.*}}#0 -> %[[BETA_PVT:.*]], @{{.*}} %{{.*}}#0 -> %[[BETA_ARRAY_PVT:.*]], @{{.*}} %{{.*}}#0 -> %[[ARG1_PVT:.*]], @{{.*}} %{{.*}}#0 -> %[[ARG2_PVT:.*]], @{{.*}} %{{.*}}#0 -> %[[ARG3_PVT:.*]], @{{.*}} %{{.*}}#0 -> %[[ARG4_PVT:.*]] : {{.*}}) {
+!FIRDialect:     omp.parallel private(@{{.*}} %[[ALPHA_DECL]]#0 -> %[[ALPHA_PVT:.*]], @{{.*}} %{{.*}} -> %[[ALPHA_ARRAY_PVT:.*]], @{{.*}} %{{.*}}#0 -> %[[BETA_PVT:.*]], @{{.*}} %{{.*}} -> %[[BETA_ARRAY_PVT:.*]], @{{.*}} %{{.*}}#0 -> %[[ARG1_PVT:.*]], @{{.*}} %{{.*}} -> %[[ARG2_PVT:.*]], @{{.*}} %{{.*}}#0 -> %[[ARG3_PVT:.*]], @{{.*}} %{{.*}} -> %[[ARG4_PVT:.*]] : {{.*}}) {
 !FIRDialect-DAG:  %[[ALPHA_PVT_DECL:.*]]:2 = hlfir.declare %[[ALPHA_PVT]] {uniq_name = "{{.*}}alpha"} : (!fir.ref<i32>) -> (!fir.ref<i32>, !fir.ref<i32>)
-!FIRDialect-DAG:  %[[ALPHA_ARRAY_PVT_DECL:.*]]:2 = hlfir.declare %[[ALPHA_ARRAY_PVT]]({{.*}}) {uniq_name = "{{.*}}alpha_array"} : (!fir.ref<!fir.array<10xi32>>, !fir.shape<1>) -> (!fir.ref<!fir.array<10xi32>>, !fir.ref<!fir.array<10xi32>>)
+!FIRDialect-DAG:  %[[ALPHA_ARRAY_PVT_DECL:.*]]:2 = hlfir.declare %[[ALPHA_ARRAY_PVT]] {uniq_name = "{{.*}}alpha_array"} :
 !FIRDialect-DAG:  %[[BETA_PVT_DECL:.*]]:2 = hlfir.declare %[[BETA_PVT]] typeparams {{.*}} {uniq_name = "{{.*}}beta"} : (!fir.ref<!fir.char<1,5>>, index) -> (!fir.ref<!fir.char<1,5>>, !fir.ref<!fir.char<1,5>>)
-!FIRDialect-DAG:  %[[BETA_ARRAY_PVT_DECL:.*]]:2 = hlfir.declare %[[BETA_ARRAY_PVT]]({{.*}}) typeparams {{.*}} {uniq_name = "{{.*}}beta_array"} : (!fir.ref<!fir.array<10x!fir.char<1,5>>>, !fir.shape<1>, index) -> (!fir.ref<!fir.array<10x!fir.char<1,5>>>, !fir.ref<!fir.array<10x!fir.char<1,5>>>)
+!FIRDialect-DAG:  %[[BETA_ARRAY_PVT_DECL:.*]]:2 = hlfir.declare %[[BETA_ARRAY_PVT]] {uniq_name = "{{.*}}beta_array"} :
 !FIRDialect-DAG:  %[[ARG1_PVT_DECL:.*]]:2 = hlfir.declare %[[ARG1_PVT]] {uniq_name = "{{.*}}arg1"} : (!fir.ref<i32>) -> (!fir.ref<i32>, !fir.ref<i32>)
-!FIRDialect-DAG:  %[[ARG2_PVT_DECL:.*]]:2 = hlfir.declare %[[ARG2_PVT]]({{.*}}) {uniq_name = "{{.*}}arg2"} : (!fir.ref<!fir.array<10xi32>>, !fir.shape<1>) -> (!fir.ref<!fir.array<10xi32>>, !fir.ref<!fir.array<10xi32>>)
+!FIRDialect-DAG:  %[[ARG2_PVT_DECL:.*]]:2 = hlfir.declare %[[ARG2_PVT]] {uniq_name = "{{.*}}arg2"} :
 !FIRDialect-DAG:  %[[ARG3_PVT_DECL:.*]]:2 = hlfir.declare %[[ARG3_PVT]] typeparams {{.*}} {uniq_name = "{{.*}}arg3"} : (!fir.ref<!fir.char<1,5>>, index) -> (!fir.ref<!fir.char<1,5>>, !fir.ref<!fir.char<1,5>>)
-!FIRDialect-DAG:  %[[ARG4_PVT_DECL:.*]]:2 = hlfir.declare %[[ARG4_PVT]]({{.*}}) typeparams {{.*}} {uniq_name = "{{.*}}arg4"} : (!fir.ref<!fir.array<10x!fir.char<1,5>>>, !fir.shape<1>, index) -> (!fir.ref<!fir.array<10x!fir.char<1,5>>>, !fir.ref<!fir.array<10x!fir.char<1,5>>>)
+!FIRDialect-DAG:  %[[ARG4_PVT_DECL:.*]]:2 = hlfir.declare %[[ARG4_PVT]] {uniq_name = "{{.*}}arg4"} :
 !FIRDialect:     omp.terminator
 !FIRDialect:  }
 

>From 4d52dde46c2e121659415afa9d5cd0501aadb38c Mon Sep 17 00:00:00 2001
From: Tom Eccles <tom.eccles at arm.com>
Date: Fri, 10 Jan 2025 18:00:35 +0000
Subject: [PATCH 08/12] Implement lupori's derive type initialization patch

For reference see both
https://github.com/llvm/llvm-project/pull/120295 and
https://github.com/llvm/llvm-project/pull/121808

This changes the barrier logic.
See discussion here: https://github.com/llvm/llvm-project/pull/120295/files#r1910663473
---
 .../lib/Lower/OpenMP/DataSharingProcessor.cpp | 14 ++-
 flang/lib/Lower/OpenMP/DataSharingProcessor.h |  2 +-
 .../Lower/OpenMP/PrivateReductionUtils.cpp    | 91 +++++++++++++++----
 .../lib/Lower/OpenMP/PrivateReductionUtils.h  | 11 ++-
 flang/lib/Lower/OpenMP/ReductionProcessor.cpp |  3 +-
 .../Lower/OpenMP/derived-type-allocatable.f90 | 38 +++++++-
 .../Lower/OpenMP/private-derived-type.f90     |  6 +-
 7 files changed, 134 insertions(+), 31 deletions(-)

diff --git a/flang/lib/Lower/OpenMP/DataSharingProcessor.cpp b/flang/lib/Lower/OpenMP/DataSharingProcessor.cpp
index b48b115b7a67cd..e927e6cdd5dcc7 100644
--- a/flang/lib/Lower/OpenMP/DataSharingProcessor.cpp
+++ b/flang/lib/Lower/OpenMP/DataSharingProcessor.cpp
@@ -168,7 +168,7 @@ void DataSharingProcessor::cloneSymbol(const semantics::Symbol *sym) {
 
   if (needInitClone()) {
     Fortran::lower::initializeCloneAtRuntime(converter, *sym, symTable);
-    callsInitClone = true;
+    mightHaveReadMoldArg = true;
   }
 }
 
@@ -220,7 +220,8 @@ bool DataSharingProcessor::needBarrier() {
   // Emit implicit barrier for linear clause. Maybe on somewhere else.
   for (const semantics::Symbol *sym : allPrivatizedSymbols) {
     if (sym->test(semantics::Symbol::Flag::OmpLastPrivate) &&
-        (sym->test(semantics::Symbol::Flag::OmpFirstPrivate) || callsInitClone))
+        (sym->test(semantics::Symbol::Flag::OmpFirstPrivate) ||
+         mightHaveReadMoldArg))
       return true;
   }
   return false;
@@ -578,7 +579,14 @@ void DataSharingProcessor::doPrivatize(const semantics::Symbol *sym,
       populateByRefInitAndCleanupRegions(
           firOpBuilder, symLoc, argType, /*scalarInitValue=*/nullptr, initBlock,
           result.getInitPrivateArg(), result.getInitMoldArg(),
-          result.getDeallocRegion(), /*isPrivate=*/true, sym);
+          result.getDeallocRegion(),
+          isFirstPrivate ? DeclOperationKind::FirstPrivate
+                         : DeclOperationKind::Private,
+          sym);
+      // TODO: currently there are false positives from dead uses of the mold
+      // arg
+      if (!result.getInitMoldArg().getUses().empty())
+        mightHaveReadMoldArg = true;
     }
 
     // Populate the `copy` region if this is a `firstprivate`.
diff --git a/flang/lib/Lower/OpenMP/DataSharingProcessor.h b/flang/lib/Lower/OpenMP/DataSharingProcessor.h
index 8c7a222ec939ff..51f42f01b46119 100644
--- a/flang/lib/Lower/OpenMP/DataSharingProcessor.h
+++ b/flang/lib/Lower/OpenMP/DataSharingProcessor.h
@@ -86,7 +86,7 @@ class DataSharingProcessor {
   lower::pft::Evaluation &eval;
   bool shouldCollectPreDeterminedSymbols;
   bool useDelayedPrivatization;
-  bool callsInitClone = false;
+  bool mightHaveReadMoldArg = false;
   lower::SymMap &symTable;
   OMPConstructSymbolVisitor visitor;
 
diff --git a/flang/lib/Lower/OpenMP/PrivateReductionUtils.cpp b/flang/lib/Lower/OpenMP/PrivateReductionUtils.cpp
index 85ff5dcee0990c..3e564bf5d4b3a7 100644
--- a/flang/lib/Lower/OpenMP/PrivateReductionUtils.cpp
+++ b/flang/lib/Lower/OpenMP/PrivateReductionUtils.cpp
@@ -122,6 +122,58 @@ fir::ShapeShiftOp Fortran::lower::omp::getShapeShift(fir::FirOpBuilder &builder,
   return shapeShift;
 }
 
+// Initialize box newBox using moldBox. These should both have the same type and
+// be boxes containing derived types e.g.
+// fir.box<!fir.type<>>
+// fir.box<!fir.heap<!fir.type<>>
+// fir.box<!fir.heap<!fir.array<fir.type<>>>
+// fir.class<...<!fir.type<>>>
+// If the type doesn't match , this does nothing
+static void initializeIfDerivedTypeBox(fir::FirOpBuilder &builder,
+                                       mlir::Location loc, mlir::Value newBox,
+                                       mlir::Value moldBox, bool hasInitializer,
+                                       bool isFirstPrivate) {
+  fir::BoxType boxTy = mlir::dyn_cast<fir::BoxType>(newBox.getType());
+  fir::ClassType classTy = mlir::dyn_cast<fir::ClassType>(newBox.getType());
+  if (!boxTy && !classTy)
+    return;
+
+  // remove pointer and array types in the middle
+  mlir::Type eleTy;
+  if (boxTy)
+    eleTy = boxTy.getElementType();
+  if (classTy)
+    eleTy = classTy.getEleTy();
+  mlir::Type derivedTy = fir::unwrapRefType(eleTy);
+  if (auto array = mlir::dyn_cast<fir::SequenceType>(derivedTy))
+    derivedTy = array.getElementType();
+
+  if (!fir::isa_derived(derivedTy))
+    return;
+  assert(moldBox.getType() == newBox.getType());
+
+  if (hasInitializer)
+    fir::runtime::genDerivedTypeInitialize(builder, loc, newBox);
+
+  if (hlfir::mayHaveAllocatableComponent(derivedTy) && !isFirstPrivate)
+    fir::runtime::genDerivedTypeInitializeClone(builder, loc, newBox, moldBox);
+}
+
+static bool
+isDerivedTypeNeedingInitialization(const Fortran::semantics::Symbol &sym) {
+  // Fortran::lower::hasDefaultInitialization returns false for ALLOCATABLE, so
+  // re-implement here.
+  // ignorePointer=true because either the pointer points to the same target as
+  // the original variable, or it is uninitialized.
+  if (const Fortran::semantics::DeclTypeSpec *declTypeSpec = sym.GetType())
+    if (const Fortran::semantics::DerivedTypeSpec *derivedTypeSpec =
+            declTypeSpec->AsDerived())
+      if (derivedTypeSpec->HasDefaultInitialization(
+              /*ignoreAllocatable=*/false, /*ignorePointer=*/true))
+        return true;
+  return false;
+}
+
 static mlir::Value generateZeroShapeForRank(fir::FirOpBuilder &builder,
                                             mlir::Location loc,
                                             mlir::Value moldArg) {
@@ -145,7 +197,7 @@ void Fortran::lower::omp::populateByRefInitAndCleanupRegions(
     fir::FirOpBuilder &builder, mlir::Location loc, mlir::Type argType,
     mlir::Value scalarInitValue, mlir::Block *initBlock,
     mlir::Value allocatedPrivVarArg, mlir::Value moldArg,
-    mlir::Region &cleanupRegion, bool isPrivate,
+    mlir::Region &cleanupRegion, DeclOperationKind kind,
     const Fortran::semantics::Symbol *sym) {
   mlir::Type ty = fir::unwrapRefType(argType);
   builder.setInsertionPointToEnd(initBlock);
@@ -153,11 +205,10 @@ void Fortran::lower::omp::populateByRefInitAndCleanupRegions(
     builder.create<mlir::omp::YieldOp>(loc, ret);
   };
 
-  if (isPrivate)
+  if (isPrivatization(kind))
     assert(sym && "Symbol information is needed to privatize derived types");
   bool needsInitialization =
-      sym ? Fortran::lower::hasDefaultInitialization(sym->GetUltimate())
-          : false;
+      sym ? isDerivedTypeNeedingInitialization(sym->GetUltimate()) : false;
 
   if (fir::isa_trivial(ty)) {
     builder.setInsertionPointToEnd(initBlock);
@@ -210,7 +261,8 @@ void Fortran::lower::omp::populateByRefInitAndCleanupRegions(
 
     // The initial state of a private pointer is undefined so we don't need to
     // match the mold argument (OpenMP 5.2 end of page 106).
-    if (isPrivate && mlir::isa<fir::PointerType>(boxTy.getEleTy())) {
+    if (isPrivatization(kind) &&
+        mlir::isa<fir::PointerType>(boxTy.getEleTy())) {
       // we need a shape with the right rank so that the embox op is lowered
       // to an llvm struct of the right type. This returns nullptr if the types
       // aren't right.
@@ -242,7 +294,7 @@ void Fortran::lower::omp::populateByRefInitAndCleanupRegions(
         TODO(loc, "Reduction/Privatization of non-allocatable trivial or "
                   "character typed box");
 
-      if ((isDerived || isChar) && (!isPrivate || scalarInitValue))
+      if ((isDerived || isChar) && (isReduction(kind) || scalarInitValue))
         TODO(loc, "Reduction of an unsupported boxed type");
 
       fir::IfOp ifUnallocated{nullptr};
@@ -259,8 +311,9 @@ void Fortran::lower::omp::populateByRefInitAndCleanupRegions(
       mlir::Value box = builder.create<fir::EmboxOp>(
           loc, ty, valAlloc, /*shape=*/mlir::Value{}, /*slice=*/mlir::Value{},
           lenParams);
-      if (needsInitialization)
-        fir::runtime::genDerivedTypeInitialize(builder, loc, box);
+      initializeIfDerivedTypeBox(
+          builder, loc, box, moldArg, needsInitialization,
+          /*isFirstPrivate=*/kind == DeclOperationKind::FirstPrivate);
       fir::StoreOp lastOp = builder.create<fir::StoreOp>(loc, box, boxAlloca);
 
       createCleanupRegion(builder, loc, argType, cleanupRegion, sym);
@@ -335,8 +388,10 @@ void Fortran::lower::omp::populateByRefInitAndCleanupRegions(
 
     if (scalarInitValue)
       builder.create<hlfir::AssignOp>(loc, scalarInitValue, box);
-    if (needsInitialization)
-      fir::runtime::genDerivedTypeInitialize(builder, loc, box);
+
+    initializeIfDerivedTypeBox(builder, loc, box, moldArg, needsInitialization,
+                               /*isFirstPrivate=*/kind ==
+                                   DeclOperationKind::FirstPrivate);
 
     builder.create<fir::StoreOp>(loc, box, boxAlloca);
     if (ifUnallocated)
@@ -371,13 +426,15 @@ void Fortran::lower::omp::populateByRefInitAndCleanupRegions(
   }
 
   if (fir::isa_derived(ty)) {
-    if (needsInitialization) {
-      builder.setInsertionPointToStart(initBlock);
-      mlir::Type boxedTy = fir::BoxType::get(ty);
-      mlir::Value box =
-          builder.create<fir::EmboxOp>(loc, boxedTy, allocatedPrivVarArg);
-      fir::runtime::genDerivedTypeInitialize(builder, loc, box);
-    }
+    builder.setInsertionPointToStart(initBlock);
+    mlir::Type boxedTy = fir::BoxType::get(ty);
+    mlir::Value newBox =
+        builder.create<fir::EmboxOp>(loc, boxedTy, allocatedPrivVarArg);
+    mlir::Value moldBox = builder.create<fir::EmboxOp>(loc, boxedTy, moldArg);
+    initializeIfDerivedTypeBox(
+        builder, loc, newBox, moldBox, needsInitialization,
+        /*isFirstPrivate=*/kind == DeclOperationKind::FirstPrivate);
+
     if (sym && hasFinalization(*sym))
       createCleanupRegion(builder, loc, argType, cleanupRegion, sym);
 
diff --git a/flang/lib/Lower/OpenMP/PrivateReductionUtils.h b/flang/lib/Lower/OpenMP/PrivateReductionUtils.h
index b8d0db54ded73c..0b4a5de445c9d8 100644
--- a/flang/lib/Lower/OpenMP/PrivateReductionUtils.h
+++ b/flang/lib/Lower/OpenMP/PrivateReductionUtils.h
@@ -35,6 +35,15 @@ namespace Fortran {
 namespace lower {
 namespace omp {
 
+enum class DeclOperationKind { Private, FirstPrivate, Reduction };
+inline bool isPrivatization(DeclOperationKind kind) {
+  return (kind == DeclOperationKind::FirstPrivate) ||
+         (kind == DeclOperationKind::Private);
+}
+inline bool isReduction(DeclOperationKind kind) {
+  return kind == DeclOperationKind::Reduction;
+}
+
 /// Generate init and cleanup regions suitable for reduction or privatizer
 /// declarations. `scalarInitValue` may be nullptr if there is no default
 /// initialization (for privatization). If this is for a privatizer, set
@@ -43,7 +52,7 @@ void populateByRefInitAndCleanupRegions(
     fir::FirOpBuilder &builder, mlir::Location loc, mlir::Type argType,
     mlir::Value scalarInitValue, mlir::Block *initBlock,
     mlir::Value allocatedPrivVarArg, mlir::Value moldArg,
-    mlir::Region &cleanupRegion, bool isPrivate = false,
+    mlir::Region &cleanupRegion, DeclOperationKind kind,
     const Fortran::semantics::Symbol *sym = nullptr);
 
 /// Generate a fir::ShapeShift op describing the provided boxed array.
diff --git a/flang/lib/Lower/OpenMP/ReductionProcessor.cpp b/flang/lib/Lower/OpenMP/ReductionProcessor.cpp
index 2cd21107a916e4..f85acada8b9855 100644
--- a/flang/lib/Lower/OpenMP/ReductionProcessor.cpp
+++ b/flang/lib/Lower/OpenMP/ReductionProcessor.cpp
@@ -442,7 +442,8 @@ static void createReductionAllocAndInitRegions(
     populateByRefInitAndCleanupRegions(builder, loc, type, initValue, initBlock,
                                        reductionDecl.getInitializerAllocArg(),
                                        reductionDecl.getInitializerMoldArg(),
-                                       reductionDecl.getCleanupRegion());
+                                       reductionDecl.getCleanupRegion(),
+                                       DeclOperationKind::Reduction);
   }
 
   if (fir::isa_trivial(ty)) {
diff --git a/flang/test/Lower/OpenMP/derived-type-allocatable.f90 b/flang/test/Lower/OpenMP/derived-type-allocatable.f90
index 1d6e22212eedd0..77bc525c390e18 100644
--- a/flang/test/Lower/OpenMP/derived-type-allocatable.f90
+++ b/flang/test/Lower/OpenMP/derived-type-allocatable.f90
@@ -13,32 +13,42 @@ module m1
 
 contains
 
+!CHECK-LABEL: omp.private {type = private} @_QMm1Ftest_class_allocatable
+!CHECK:       fir.call @_FortranAInitialize
+!CHECK:       omp.yield
+
+!CHECK-LABEL: omp.private {type = private} @_QMm1Ftest_allocatable
+!CHECK:       fir.call @_FortranAInitialize
+!CHECK:       omp.yield
+
 !CHECK-LABEL: omp.private {type = private} @_QMm1Ftest_pointer
 !CHECK-NOT:   fir.call @_FortranAInitializeClone
 !CHECK:       omp.yield
 
 !CHECK-LABEL: omp.private {type = private} @_QMm1Ftest_nested
 !CHECK:       fir.call @_FortranAInitializeClone
-!CHECK-NEXT:  omp.yield
+!CHECK:       omp.yield
 
 !CHECK-LABEL: omp.private {type = private} @_QMm1Ftest_array_of_allocs
 !CHECK:       fir.call @_FortranAInitializeClone
-!CHECK-NEXT:  omp.yield
+!CHECK:       omp.yield
 !CHECK:       } dealloc {
 !CHECK:       fir.call @_FortranAAllocatableDeallocate
 !CHECK:       omp.yield
 
 !CHECK-LABEL: omp.private {type = firstprivate} @_QMm1Ftest_array
+!CHECK:       fir.call @_FortranAInitialize(
 !CHECK-NOT:   fir.call @_FortranAInitializeClone
 !CHECK:       omp.yield
 
 !CHECK-LABEL: omp.private {type = private} @_QMm1Ftest_array
+!CHECK:       fir.call @_FortranAInitialize(
 !CHECK:       fir.call @_FortranAInitializeClone
-!CHECK-NEXT:  omp.yield
+!CHECK:       omp.yield
 
 !CHECK-LABEL: omp.private {type = private} @_QMm1Ftest_scalar
 !CHECK:       fir.call @_FortranAInitializeClone
-!CHECK-NEXT:  omp.yield
+!CHECK:       omp.yield
 
   subroutine test_scalar()
     type(x) :: v
@@ -105,4 +115,24 @@ subroutine test_pointer()
     !$omp parallel private(ptr)
     !$omp end parallel
   end subroutine
+
+  subroutine test_allocatable()
+    type needs_init
+      integer :: i = 1
+    end type
+    type(needs_init), allocatable :: a
+
+    !$omp parallel private(a)
+    !$omp end parallel
+  end subroutine
+
+  subroutine test_class_allocatable()
+    type needs_init
+      integer :: i = 1
+    end type
+    class(needs_init), allocatable :: a
+
+    !$omp parallel private(a)
+    !$omp end parallel
+  end subroutine
 end module
diff --git a/flang/test/Lower/OpenMP/private-derived-type.f90 b/flang/test/Lower/OpenMP/private-derived-type.f90
index df1c7c3f922271..91d8fa753f2ec3 100644
--- a/flang/test/Lower/OpenMP/private-derived-type.f90
+++ b/flang/test/Lower/OpenMP/private-derived-type.f90
@@ -15,16 +15,14 @@ subroutine s4
   !$omp end parallel
 end subroutine s4
 
-! CHECK:  omp.private {type = private} @[[DERIVED_PRIV:.*]] : !fir.ref<!fir.type<{{.*}}y3{x:!fir.box<!fir.heap<i32>>}>> alloc {
-! CHECK:             %[[VAL_23:.*]] = fir.alloca !fir.type<_QFs4Ty3{x:!fir.box<!fir.heap<i32>>}> {bindc_name = "v", pinned, uniq_name = "_QFs4Ev"}
-! CHECK:             %[[VAL_25:.*]] = fir.embox %[[VAL_23]] : (!fir.ref<!fir.type<_QFs4Ty3{x:!fir.box<!fir.heap<i32>>}>>) -> !fir.box<!fir.type<_QFs4Ty3{x:!fir.box<!fir.heap<i32>>}>>
+! CHECK:  omp.private {type = private} @[[DERIVED_PRIV:.*]] : !fir.type<{{.*}}y3{x:!fir.box<!fir.heap<i32>>}> init {
+! CHECK:             %[[VAL_25:.*]] = fir.embox %[[VAL_23:.*]] : (!fir.ref<!fir.type<_QFs4Ty3{x:!fir.box<!fir.heap<i32>>}>>) -> !fir.box<!fir.type<_QFs4Ty3{x:!fir.box<!fir.heap<i32>>}>>
 ! CHECK:             %[[VAL_26:.*]] = fir.address_of
 ! CHECK:             %[[VAL_27:.*]] = arith.constant 8 : i32
 ! CHECK:             %[[VAL_28:.*]] = fir.convert %[[VAL_25]] : (!fir.box<!fir.type<_QFs4Ty3{x:!fir.box<!fir.heap<i32>>}>>) -> !fir.box<none>
 ! CHECK:             %[[VAL_29:.*]] = fir.convert %[[VAL_26]] : (!fir.ref<!fir.char<1,{{.*}}>>) -> !fir.ref<i8>
 !                    Check we do call FortranAInitialize on the derived type
 ! CHECK:             fir.call @_FortranAInitialize(%[[VAL_28]], %[[VAL_29]], %[[VAL_27]]) fastmath<contract> : (!fir.box<none>, !fir.ref<i8>, i32) -> ()
-! CHECK:             %[[VAL_24:.*]]:2 = hlfir.declare %[[VAL_23]] {uniq_name = "_QFs4Ev"} : (!fir.ref<!fir.type<_QFs4Ty3{x:!fir.box<!fir.heap<i32>>}>>) -> (!fir.ref<!fir.type<_QFs4Ty3{x:!fir.box<!fir.heap<i32>>}>>, !fir.ref<!fir.type<_QFs4Ty3{x:!fir.box<!fir.heap<i32>>}>>)
 ! CHECK:  }
 
 ! CHECK-LABEL:   func.func @_QPs4() {

>From eec1a1cc0e63e3410da2619cbae03fc5d496244d Mon Sep 17 00:00:00 2001
From: Tom Eccles <tom.eccles at arm.com>
Date: Tue, 14 Jan 2025 15:45:06 +0000
Subject: [PATCH 09/12] Deallocate derived types

This will deallocate in exactly the same cases as the old implementation
before this series. I expect some cases are missed but I want to avoid
functional changes wherever possible in this (already too big) patch
series.
---
 .../lib/Lower/OpenMP/DataSharingProcessor.cpp |  2 +-
 .../Lower/OpenMP/PrivateReductionUtils.cpp    | 44 +++++++++++++++----
 .../lib/Lower/OpenMP/PrivateReductionUtils.h  |  4 +-
 flang/lib/Lower/OpenMP/ReductionProcessor.cpp | 21 +++++----
 flang/lib/Lower/OpenMP/ReductionProcessor.h   |  2 +-
 5 files changed, 53 insertions(+), 20 deletions(-)

diff --git a/flang/lib/Lower/OpenMP/DataSharingProcessor.cpp b/flang/lib/Lower/OpenMP/DataSharingProcessor.cpp
index e927e6cdd5dcc7..8b70f5a5fde1c2 100644
--- a/flang/lib/Lower/OpenMP/DataSharingProcessor.cpp
+++ b/flang/lib/Lower/OpenMP/DataSharingProcessor.cpp
@@ -577,7 +577,7 @@ void DataSharingProcessor::doPrivatize(const semantics::Symbol *sym,
           &initRegion, /*insertPt=*/{}, {argType, argType}, {symLoc, symLoc});
 
       populateByRefInitAndCleanupRegions(
-          firOpBuilder, symLoc, argType, /*scalarInitValue=*/nullptr, initBlock,
+          converter, symLoc, argType, /*scalarInitValue=*/nullptr, initBlock,
           result.getInitPrivateArg(), result.getInitMoldArg(),
           result.getDeallocRegion(),
           isFirstPrivate ? DeclOperationKind::FirstPrivate
diff --git a/flang/lib/Lower/OpenMP/PrivateReductionUtils.cpp b/flang/lib/Lower/OpenMP/PrivateReductionUtils.cpp
index 3e564bf5d4b3a7..37c716fff30e64 100644
--- a/flang/lib/Lower/OpenMP/PrivateReductionUtils.cpp
+++ b/flang/lib/Lower/OpenMP/PrivateReductionUtils.cpp
@@ -12,6 +12,8 @@
 
 #include "PrivateReductionUtils.h"
 
+#include "flang/Lower/AbstractConverter.h"
+#include "flang/Lower/Allocatable.h"
 #include "flang/Lower/ConvertVariable.h"
 #include "flang/Optimizer/Builder/BoxValue.h"
 #include "flang/Optimizer/Builder/Character.h"
@@ -37,9 +39,11 @@ static bool hasFinalization(const Fortran::semantics::Symbol &sym) {
   return false;
 }
 
-static void createCleanupRegion(fir::FirOpBuilder &builder, mlir::Location loc,
-                                mlir::Type argType, mlir::Region &cleanupRegion,
+static void createCleanupRegion(Fortran::lower::AbstractConverter &converter,
+                                mlir::Location loc, mlir::Type argType,
+                                mlir::Region &cleanupRegion,
                                 const Fortran::semantics::Symbol *sym) {
+  fir::FirOpBuilder &builder = converter.getFirOpBuilder();
   assert(cleanupRegion.empty());
   mlir::Block *block = builder.createBlock(&cleanupRegion, cleanupRegion.end(),
                                            {argType}, {loc});
@@ -54,6 +58,29 @@ static void createCleanupRegion(fir::FirOpBuilder &builder, mlir::Location loc,
 
   mlir::Type valTy = fir::unwrapRefType(argType);
   if (auto boxTy = mlir::dyn_cast_or_null<fir::BaseBoxType>(valTy)) {
+    // TODO: what about undoing init of unboxed derived types?
+    if (auto recTy = mlir::dyn_cast<fir::RecordType>(
+            fir::unwrapSequenceType(fir::dyn_cast_ptrOrBoxEleTy(boxTy)))) {
+      mlir::Type eleTy = boxTy.getEleTy();
+      if (mlir::isa<fir::PointerType, fir::HeapType>(eleTy)) {
+        mlir::Type mutableBoxTy =
+            fir::ReferenceType::get(fir::BoxType::get(eleTy));
+        mlir::Value converted =
+            builder.createConvert(loc, mutableBoxTy, block->getArgument(0));
+        if (recTy.getNumLenParams() > 0)
+          TODO(loc, "Deallocate box with length parameters");
+        fir::MutableBoxValue mutableBox{converted, /*lenParameters=*/{},
+                                        /*mutableProperties=*/{}};
+        Fortran::lower::genDeallocateIfAllocated(converter, mutableBox, loc);
+        builder.create<mlir::omp::YieldOp>(loc);
+        return;
+      }
+    }
+
+    // TODO: just replace this whole body with
+    // Fortran::lower::genDeallocateIfAllocated (not done now to avoid test
+    // churn)
+
     mlir::Value arg = builder.loadIfRef(loc, block->getArgument(0));
     assert(mlir::isa<fir::BaseBoxType>(arg.getType()));
 
@@ -194,11 +221,12 @@ static mlir::Value generateZeroShapeForRank(fir::FirOpBuilder &builder,
 }
 
 void Fortran::lower::omp::populateByRefInitAndCleanupRegions(
-    fir::FirOpBuilder &builder, mlir::Location loc, mlir::Type argType,
-    mlir::Value scalarInitValue, mlir::Block *initBlock,
+    Fortran::lower::AbstractConverter &converter, mlir::Location loc,
+    mlir::Type argType, mlir::Value scalarInitValue, mlir::Block *initBlock,
     mlir::Value allocatedPrivVarArg, mlir::Value moldArg,
     mlir::Region &cleanupRegion, DeclOperationKind kind,
     const Fortran::semantics::Symbol *sym) {
+  fir::FirOpBuilder &builder = converter.getFirOpBuilder();
   mlir::Type ty = fir::unwrapRefType(argType);
   builder.setInsertionPointToEnd(initBlock);
   auto yield = [&](mlir::Value ret) {
@@ -316,7 +344,7 @@ void Fortran::lower::omp::populateByRefInitAndCleanupRegions(
           /*isFirstPrivate=*/kind == DeclOperationKind::FirstPrivate);
       fir::StoreOp lastOp = builder.create<fir::StoreOp>(loc, box, boxAlloca);
 
-      createCleanupRegion(builder, loc, argType, cleanupRegion, sym);
+      createCleanupRegion(converter, loc, argType, cleanupRegion, sym);
 
       if (ifUnallocated)
         builder.setInsertionPointAfter(ifUnallocated);
@@ -364,7 +392,7 @@ void Fortran::lower::omp::populateByRefInitAndCleanupRegions(
            "createTempFromMold decides this statically");
     if (cstNeedsDealloc.has_value() && *cstNeedsDealloc != false) {
       mlir::OpBuilder::InsertionGuard guard(builder);
-      createCleanupRegion(builder, loc, argType, cleanupRegion, sym);
+      createCleanupRegion(converter, loc, argType, cleanupRegion, sym);
     } else {
       assert(!isAllocatableOrPointer &&
              "Pointer-like arrays must be heap allocated");
@@ -418,7 +446,7 @@ void Fortran::lower::omp::populateByRefInitAndCleanupRegions(
         loc, eleTy, /*name=*/{}, /*shape=*/{}, /*lenParams=*/len);
     mlir::Value boxChar = charExprHelper.createEmboxChar(privateAddr, len);
 
-    createCleanupRegion(builder, loc, argType, cleanupRegion, sym);
+    createCleanupRegion(converter, loc, argType, cleanupRegion, sym);
 
     builder.setInsertionPointToEnd(initBlock);
     yield(boxChar);
@@ -436,7 +464,7 @@ void Fortran::lower::omp::populateByRefInitAndCleanupRegions(
         /*isFirstPrivate=*/kind == DeclOperationKind::FirstPrivate);
 
     if (sym && hasFinalization(*sym))
-      createCleanupRegion(builder, loc, argType, cleanupRegion, sym);
+      createCleanupRegion(converter, loc, argType, cleanupRegion, sym);
 
     builder.setInsertionPointToEnd(initBlock);
     yield(allocatedPrivVarArg);
diff --git a/flang/lib/Lower/OpenMP/PrivateReductionUtils.h b/flang/lib/Lower/OpenMP/PrivateReductionUtils.h
index 0b4a5de445c9d8..7b7adc09c835b2 100644
--- a/flang/lib/Lower/OpenMP/PrivateReductionUtils.h
+++ b/flang/lib/Lower/OpenMP/PrivateReductionUtils.h
@@ -33,6 +33,8 @@ class ShapeShiftOp;
 
 namespace Fortran {
 namespace lower {
+class AbstractConverter;
+
 namespace omp {
 
 enum class DeclOperationKind { Private, FirstPrivate, Reduction };
@@ -49,7 +51,7 @@ inline bool isReduction(DeclOperationKind kind) {
 /// initialization (for privatization). If this is for a privatizer, set
 /// `isPrivate` to `true`.
 void populateByRefInitAndCleanupRegions(
-    fir::FirOpBuilder &builder, mlir::Location loc, mlir::Type argType,
+    AbstractConverter &converter, mlir::Location loc, mlir::Type argType,
     mlir::Value scalarInitValue, mlir::Block *initBlock,
     mlir::Value allocatedPrivVarArg, mlir::Value moldArg,
     mlir::Region &cleanupRegion, DeclOperationKind kind,
diff --git a/flang/lib/Lower/OpenMP/ReductionProcessor.cpp b/flang/lib/Lower/OpenMP/ReductionProcessor.cpp
index f85acada8b9855..4a811f1bdfdf53 100644
--- a/flang/lib/Lower/OpenMP/ReductionProcessor.cpp
+++ b/flang/lib/Lower/OpenMP/ReductionProcessor.cpp
@@ -410,10 +410,11 @@ static mlir::Type unwrapSeqOrBoxedType(mlir::Type ty) {
 }
 
 static void createReductionAllocAndInitRegions(
-    fir::FirOpBuilder &builder, mlir::Location loc,
+    AbstractConverter &converter, mlir::Location loc,
     mlir::omp::DeclareReductionOp &reductionDecl,
     const ReductionProcessor::ReductionIdentifier redId, mlir::Type type,
     bool isByRef) {
+  fir::FirOpBuilder &builder = converter.getFirOpBuilder();
   auto yield = [&](mlir::Value ret) {
     builder.create<mlir::omp::YieldOp>(loc, ret);
   };
@@ -439,11 +440,11 @@ static void createReductionAllocAndInitRegions(
       loc, unwrapSeqOrBoxedType(ty), redId, builder);
 
   if (isByRef) {
-    populateByRefInitAndCleanupRegions(builder, loc, type, initValue, initBlock,
-                                       reductionDecl.getInitializerAllocArg(),
-                                       reductionDecl.getInitializerMoldArg(),
-                                       reductionDecl.getCleanupRegion(),
-                                       DeclOperationKind::Reduction);
+    populateByRefInitAndCleanupRegions(
+        converter, loc, type, initValue, initBlock,
+        reductionDecl.getInitializerAllocArg(),
+        reductionDecl.getInitializerMoldArg(), reductionDecl.getCleanupRegion(),
+        DeclOperationKind::Reduction);
   }
 
   if (fir::isa_trivial(ty)) {
@@ -467,9 +468,10 @@ static void createReductionAllocAndInitRegions(
 }
 
 mlir::omp::DeclareReductionOp ReductionProcessor::createDeclareReduction(
-    fir::FirOpBuilder &builder, llvm::StringRef reductionOpName,
+    AbstractConverter &converter, llvm::StringRef reductionOpName,
     const ReductionIdentifier redId, mlir::Type type, mlir::Location loc,
     bool isByRef) {
+  fir::FirOpBuilder &builder = converter.getFirOpBuilder();
   mlir::OpBuilder::InsertionGuard guard(builder);
   mlir::ModuleOp module = builder.getModule();
 
@@ -487,7 +489,8 @@ mlir::omp::DeclareReductionOp ReductionProcessor::createDeclareReduction(
 
   decl = modBuilder.create<mlir::omp::DeclareReductionOp>(loc, reductionOpName,
                                                           type);
-  createReductionAllocAndInitRegions(builder, loc, decl, redId, type, isByRef);
+  createReductionAllocAndInitRegions(converter, loc, decl, redId, type,
+                                     isByRef);
 
   builder.createBlock(&decl.getReductionRegion(),
                       decl.getReductionRegion().end(), {type, type},
@@ -646,7 +649,7 @@ void ReductionProcessor::addDeclareReduction(
       TODO(currentLocation, "Unexpected reduction type");
     }
 
-    decl = createDeclareReduction(firOpBuilder, reductionName, redId, redType,
+    decl = createDeclareReduction(converter, reductionName, redId, redType,
                                   currentLocation, isByRef);
     reductionDeclSymbols.push_back(
         mlir::SymbolRefAttr::get(firOpBuilder.getContext(), decl.getSymName()));
diff --git a/flang/lib/Lower/OpenMP/ReductionProcessor.h b/flang/lib/Lower/OpenMP/ReductionProcessor.h
index 5f4d742b62cb10..d7d9b067e0bac6 100644
--- a/flang/lib/Lower/OpenMP/ReductionProcessor.h
+++ b/flang/lib/Lower/OpenMP/ReductionProcessor.h
@@ -113,7 +113,7 @@ class ReductionProcessor {
   /// value `initValue`, and the reduction combiner carried over from `reduce`.
   /// TODO: add atomic region.
   static mlir::omp::DeclareReductionOp
-  createDeclareReduction(fir::FirOpBuilder &builder,
+  createDeclareReduction(AbstractConverter &builder,
                          llvm::StringRef reductionOpName,
                          const ReductionIdentifier redId, mlir::Type type,
                          mlir::Location loc, bool isByRef);

>From 863ba4552485815a0d494d08690a4226ee5ca13d Mon Sep 17 00:00:00 2001
From: Tom Eccles <tom.eccles at arm.com>
Date: Tue, 21 Jan 2025 15:23:13 +0000
Subject: [PATCH 10/12] Support arrays of polymorphic types

---
 .../Lower/OpenMP/PrivateReductionUtils.cpp    | 22 +++++++++++++++++++
 .../Lower/OpenMP/derived-type-allocatable.f90 | 14 ++++++++++++
 2 files changed, 36 insertions(+)

diff --git a/flang/lib/Lower/OpenMP/PrivateReductionUtils.cpp b/flang/lib/Lower/OpenMP/PrivateReductionUtils.cpp
index 37c716fff30e64..02f5a4cf652c74 100644
--- a/flang/lib/Lower/OpenMP/PrivateReductionUtils.cpp
+++ b/flang/lib/Lower/OpenMP/PrivateReductionUtils.cpp
@@ -376,6 +376,28 @@ void Fortran::lower::omp::populateByRefInitAndCleanupRegions(
     mlir::Value loadedBox = builder.loadIfRef(loc, moldArg);
     hlfir::Entity source = hlfir::Entity{loadedBox};
 
+    // Special case for (possibly allocatable) arrays of polymorphic types
+    // e.g. !fir.class<!fir.heap<!fir.array<?x!fir.type<>>>>
+    if (source.isPolymorphic()) {
+      fir::ShapeShiftOp shape = getShapeShift(builder, loc, source);
+      mlir::Type arrayType = source.getElementOrSequenceType();
+      mlir::Value allocatedArray = builder.create<fir::AllocMemOp>(
+          loc, arrayType, /*typeparams=*/mlir::ValueRange{},
+          shape.getExtents());
+      mlir::Value firClass = builder.create<fir::EmboxOp>(
+          loc, source.getType(), allocatedArray, shape);
+      initializeIfDerivedTypeBox(
+          builder, loc, firClass, source, needsInitialization,
+          /*isFirstprivate=*/kind == DeclOperationKind::FirstPrivate);
+      builder.create<fir::StoreOp>(loc, firClass, allocatedPrivVarArg);
+      if (ifUnallocated)
+        builder.setInsertionPointAfter(ifUnallocated);
+      yield(allocatedPrivVarArg);
+      mlir::OpBuilder::InsertionGuard guard(builder);
+      createCleanupRegion(converter, loc, argType, cleanupRegion, sym);
+      return;
+    }
+
     // Allocating on the heap in case the whole reduction is nested inside of a
     // loop
     // TODO: compare performance here to using allocas - this could be made to
diff --git a/flang/test/Lower/OpenMP/derived-type-allocatable.f90 b/flang/test/Lower/OpenMP/derived-type-allocatable.f90
index 77bc525c390e18..86434569f9653b 100644
--- a/flang/test/Lower/OpenMP/derived-type-allocatable.f90
+++ b/flang/test/Lower/OpenMP/derived-type-allocatable.f90
@@ -13,6 +13,10 @@ module m1
 
 contains
 
+!CHECK-LABEL: omp.private {type = private} @_QMm1Ftest_class_allocatable_array
+!CHECK:       fir.call @_FortranAInitialize
+!CHECK:       omp.yield
+
 !CHECK-LABEL: omp.private {type = private} @_QMm1Ftest_class_allocatable
 !CHECK:       fir.call @_FortranAInitialize
 !CHECK:       omp.yield
@@ -135,4 +139,14 @@ subroutine test_class_allocatable()
     !$omp parallel private(a)
     !$omp end parallel
   end subroutine
+
+  subroutine test_class_allocatable_array()
+    type needs_init
+      integer :: i = 1
+    end type
+    class(needs_init), allocatable :: a(:)
+
+    !$omp parallel private(a)
+    !$omp end parallel
+  end subroutine
 end module

>From 04e6c516555869dc60f68c694367d1c54d0c9eba Mon Sep 17 00:00:00 2001
From: Tom Eccles <tom.eccles at arm.com>
Date: Fri, 10 Jan 2025 18:43:58 +0000
Subject: [PATCH 11/12] Fix compiler crash for unnesecarry len params

---
 .../Lower/OpenMP/PrivateReductionUtils.cpp    | 46 +++++++++++++------
 flang/test/Lower/OpenMP/copyprivate.f90       | 38 +++++++--------
 .../OpenMP/delayed-privatization-pointer.f90  | 31 +++++++++++--
 3 files changed, 76 insertions(+), 39 deletions(-)

diff --git a/flang/lib/Lower/OpenMP/PrivateReductionUtils.cpp b/flang/lib/Lower/OpenMP/PrivateReductionUtils.cpp
index 02f5a4cf652c74..708d92be6fdb0e 100644
--- a/flang/lib/Lower/OpenMP/PrivateReductionUtils.cpp
+++ b/flang/lib/Lower/OpenMP/PrivateReductionUtils.cpp
@@ -186,6 +186,29 @@ static void initializeIfDerivedTypeBox(fir::FirOpBuilder &builder,
     fir::runtime::genDerivedTypeInitializeClone(builder, loc, newBox, moldBox);
 }
 
+static void getLengthParameters(fir::FirOpBuilder &builder, mlir::Location loc,
+                                mlir::Value moldArg,
+                                llvm::SmallVectorImpl<mlir::Value> &lenParams) {
+  // We pass derived types unboxed and so are not self-contained entities.
+  // Assume that unboxed derived types won't need length paramters.
+  if (!hlfir::isFortranEntity(moldArg))
+    return;
+
+  hlfir::genLengthParameters(loc, builder, hlfir::Entity{moldArg}, lenParams);
+  if (lenParams.empty())
+    return;
+
+  // The verifier for EmboxOp doesn't allow length parameters when the the
+  // character already has static LEN. genLengthParameters may still return them
+  // in this case.
+  mlir::Type unwrappedType =
+      fir::unwrapRefType(fir::unwrapSeqOrBoxedSeqType(moldArg.getType()));
+  if (auto strTy = mlir::dyn_cast<fir::CharacterType>(unwrappedType)) {
+    if (strTy.hasConstantLen())
+      lenParams.resize(0);
+  }
+}
+
 static bool
 isDerivedTypeNeedingInitialization(const Fortran::semantics::Symbol &sym) {
   // Fortran::lower::hasDefaultInitialization returns false for ALLOCATABLE, so
@@ -287,6 +310,9 @@ void Fortran::lower::omp::populateByRefInitAndCleanupRegions(
     builder.setInsertionPointToEnd(initBlock);
     mlir::Value boxAlloca = allocatedPrivVarArg;
 
+    moldArg = builder.loadIfRef(loc, moldArg);
+    getLengthParameters(builder, loc, moldArg, lenParams);
+
     // The initial state of a private pointer is undefined so we don't need to
     // match the mold argument (OpenMP 5.2 end of page 106).
     if (isPrivatization(kind) &&
@@ -299,20 +325,17 @@ void Fortran::lower::omp::populateByRefInitAndCleanupRegions(
       mlir::Value null = builder.createNullConstant(loc, boxTy.getEleTy());
       mlir::Value nullBox;
       if (shape)
-        nullBox = builder.create<fir::EmboxOp>(loc, boxTy, null, shape);
+        nullBox = builder.create<fir::EmboxOp>(
+            loc, boxTy, null, shape, /*slice=*/mlir::Value{}, lenParams);
       else
-        nullBox = builder.create<fir::EmboxOp>(loc, boxTy, null);
+        nullBox = builder.create<fir::EmboxOp>(
+            loc, boxTy, null, /*shape=*/mlir::Value{}, /*slice=*/mlir::Value{},
+            lenParams);
       builder.create<fir::StoreOp>(loc, nullBox, boxAlloca);
       yield(boxAlloca);
       return;
     }
 
-    moldArg = builder.loadIfRef(loc, moldArg);
-    // We pass derived types unboxed and so are not self-contained entities.
-    if (hlfir::isFortranEntity(moldArg))
-      hlfir::genLengthParameters(loc, builder, hlfir::Entity{moldArg},
-                                 lenParams);
-
     mlir::Type innerTy = fir::unwrapRefType(boxTy.getEleTy());
     bool isDerived = fir::isa_derived(innerTy);
     bool isChar = fir::isa_char(innerTy);
@@ -359,12 +382,7 @@ void Fortran::lower::omp::populateByRefInitAndCleanupRegions(
       TODO(loc, "Unsupported boxed type for reduction/privatization");
 
     moldArg = builder.loadIfRef(loc, moldArg);
-    // We pass derived types unboxed and so are not self-contained entities.
-    // Assume that if length parameters are required, they will be boxed by
-    // lowering.
-    if (hlfir::isFortranEntity(moldArg))
-      hlfir::genLengthParameters(loc, builder, hlfir::Entity{moldArg},
-                                 lenParams);
+    getLengthParameters(builder, loc, moldArg, lenParams);
 
     fir::IfOp ifUnallocated{nullptr};
     if (isAllocatableOrPointer) {
diff --git a/flang/test/Lower/OpenMP/copyprivate.f90 b/flang/test/Lower/OpenMP/copyprivate.f90
index 761e6190ed6efc..4c3ed9389369f3 100644
--- a/flang/test/Lower/OpenMP/copyprivate.f90
+++ b/flang/test/Lower/OpenMP/copyprivate.f90
@@ -14,13 +14,13 @@
 !CHECK-DAG: func private @_copy_c16x8(%{{.*}}: !fir.ref<!fir.char<2,8>>, %{{.*}}: !fir.ref<!fir.char<2,8>>)
 
 !CHECK-DAG: func private @_copy_box_Uxi32(%{{.*}}: !fir.ref<!fir.box<!fir.array<?xi32>>>, %{{.*}}: !fir.ref<!fir.box<!fir.array<?xi32>>>)
-!CHECK-DAG: func private @_copy_10xi32(%{{.*}}: !fir.ref<!fir.array<10xi32>>, %{{.*}}: !fir.ref<!fir.array<10xi32>>)
-!CHECK-DAG: func private @_copy_3x4xi32(%{{.*}}: !fir.ref<!fir.array<3x4xi32>>, %{{.*}}: !fir.ref<!fir.array<3x4xi32>>)
-!CHECK-DAG: func private @_copy_10xf32(%{{.*}}: !fir.ref<!fir.array<10xf32>>, %{{.*}}: !fir.ref<!fir.array<10xf32>>)
-!CHECK-DAG: func private @_copy_3x4xz32(%{{.*}}: !fir.ref<!fir.array<3x4xcomplex<f32>>>, %{{.*}}: !fir.ref<!fir.array<3x4xcomplex<f32>>>)
-!CHECK-DAG: func private @_copy_10xl32(%{{.*}}: !fir.ref<!fir.array<10x!fir.logical<4>>>, %{{.*}}: !fir.ref<!fir.array<10x!fir.logical<4>>>)
-!CHECK-DAG: func private @_copy_3xc8x8(%{{.*}}: !fir.ref<!fir.array<3x!fir.char<1,8>>>, %{{.*}}: !fir.ref<!fir.array<3x!fir.char<1,8>>>)
-!CHECK-DAG: func private @_copy_3xc16x5(%{{.*}}: !fir.ref<!fir.array<3x!fir.char<2,5>>>, %{{.*}}: !fir.ref<!fir.array<3x!fir.char<2,5>>>)
+!CHECK-DAG: func private @_copy_box_10xi32(%{{.*}}: !fir.ref<!fir.box<!fir.array<10xi32>>>, %{{.*}}: !fir.ref<!fir.box<!fir.array<10xi32>>>)
+!CHECK-DAG: func private @_copy_box_3x4xi32(%{{.*}}: !fir.ref<!fir.box<!fir.array<3x4xi32>>>, %{{.*}}: !fir.ref<!fir.box<!fir.array<3x4xi32>>>)
+!CHECK-DAG: func private @_copy_box_10xf32(%{{.*}}: !fir.ref<!fir.box<!fir.array<10xf32>>>, %{{.*}}: !fir.ref<!fir.box<!fir.array<10xf32>>>)
+!CHECK-DAG: func private @_copy_box_3x4xz32(%{{.*}}: !fir.ref<!fir.box<!fir.array<3x4xcomplex<f32>>>>, %{{.*}}: !fir.ref<!fir.box<!fir.array<3x4xcomplex<f32>>>>)
+!CHECK-DAG: func private @_copy_box_10xl32(%{{.*}}: !fir.ref<!fir.box<!fir.array<10x!fir.logical<4>>>>, %{{.*}}: !fir.ref<!fir.box<!fir.array<10x!fir.logical<4>>>>)
+!CHECK-DAG: func private @_copy_box_3xc8x8(%{{.*}}: !fir.ref<!fir.box<!fir.array<3x!fir.char<1,8>>>>, %{{.*}}: !fir.ref<!fir.box<!fir.array<3x!fir.char<1,8>>>>)
+!CHECK-DAG: func private @_copy_box_3xc16x5(%{{.*}}: !fir.ref<!fir.box<!fir.array<3x!fir.char<2,5>>>>, %{{.*}}: !fir.ref<!fir.box<!fir.array<3x!fir.char<2,5>>>>)
 
 !CHECK-DAG: func private @_copy_rec__QFtest_dtTdt(%{{.*}}: !fir.ref<!fir.type<_QFtest_dtTdt{i:i32,r:f32}>>, %{{.*}}: !fir.ref<!fir.type<_QFtest_dtTdt{i:i32,r:f32}>>)
 !CHECK-DAG: func private @_copy_box_heap_Uxi32(%{{.*}}: !fir.ref<!fir.box<!fir.heap<!fir.array<?xi32>>>>, %{{.*}}: !fir.ref<!fir.box<!fir.heap<!fir.array<?xi32>>>>)
@@ -95,20 +95,16 @@ subroutine test_scalar()
 
 !CHECK-LABEL: func @_QPtest_array
 !CHECK:         omp.parallel
-!CHECK:           %[[A:.*]]:2 = hlfir.declare %{{.*}}(%{{.*}}) {uniq_name = "_QFtest_arrayEa"} : (!fir.box<!fir.array<?xi32>>, !fir.shift<1>) -> (!fir.box<!fir.array<?xi32>>, !fir.box<!fir.array<?xi32>>)
-!CHECK:           %[[I1:.*]]:2 = hlfir.declare %{{.*}}(%{{.*}}) {uniq_name = "_QFtest_arrayEi1"} : (!fir.ref<!fir.array<10xi32>>, !fir.shape<1>) -> (!fir.ref<!fir.array<10xi32>>, !fir.ref<!fir.array<10xi32>>)
-!CHECK:           %[[I2:.*]]:2 = hlfir.declare %{{.*}}(%{{.*}}) {uniq_name = "_QFtest_arrayEi2"} : (!fir.ref<!fir.array<3x4xi32>>, !fir.shape<2>) -> (!fir.ref<!fir.array<3x4xi32>>, !fir.ref<!fir.array<3x4xi32>>)
-!CHECK:           %[[I3:.*]]:2 = hlfir.declare %{{.*}}(%{{.*}}) {uniq_name = "_QFtest_arrayEi3"} : (!fir.ref<!fir.array<?xi32>>, !fir.shapeshift<1>) -> (!fir.box<!fir.array<?xi32>>, !fir.ref<!fir.array<?xi32>>)
-!CHECK:           %[[R1:.*]]:2 = hlfir.declare %{{.*}}(%{{.*}}) {uniq_name = "_QFtest_arrayEr1"} : (!fir.ref<!fir.array<10xf32>>, !fir.shape<1>) -> (!fir.ref<!fir.array<10xf32>>, !fir.ref<!fir.array<10xf32>>)
-!CHECK:           %[[C1:.*]]:2 = hlfir.declare %{{.*}}(%{{.*}}) {uniq_name = "_QFtest_arrayEc1"} : (!fir.ref<!fir.array<3x4xcomplex<f32>>>, !fir.shape<2>) -> (!fir.ref<!fir.array<3x4xcomplex<f32>>>, !fir.ref<!fir.array<3x4xcomplex<f32>>>)
-!CHECK:           %[[L1:.*]]:2 = hlfir.declare %{{.*}}(%{{.*}}) {uniq_name = "_QFtest_arrayEl1"} : (!fir.ref<!fir.array<10x!fir.logical<4>>>, !fir.shape<1>) -> (!fir.ref<!fir.array<10x!fir.logical<4>>>, !fir.ref<!fir.array<10x!fir.logical<4>>>)
-!CHECK:           %[[S1:.*]]:2 = hlfir.declare {{.*}} {uniq_name = "_QFtest_arrayEs1"} : (!fir.ref<!fir.array<3x!fir.char<1,8>>>, !fir.shape<1>, index) -> (!fir.ref<!fir.array<3x!fir.char<1,8>>>, !fir.ref<!fir.array<3x!fir.char<1,8>>>)
-!CHECK:           %[[S2:.*]]:2 = hlfir.declare {{.*}} {uniq_name = "_QFtest_arrayEs2"} : (!fir.ref<!fir.array<3x!fir.char<2,5>>>, !fir.shape<1>, index) -> (!fir.ref<!fir.array<3x!fir.char<2,5>>>, !fir.ref<!fir.array<3x!fir.char<2,5>>>)
-!CHECK:           %[[A_REF:.*]] = fir.alloca !fir.box<!fir.array<?xi32>>
-!CHECK:           fir.store %[[A]]#0 to %[[A_REF]] : !fir.ref<!fir.box<!fir.array<?xi32>>>
-!CHECK:           %[[I3_REF:.*]] = fir.alloca !fir.box<!fir.array<?xi32>>
-!CHECK:           fir.store %[[I3]]#0 to %[[I3_REF]] : !fir.ref<!fir.box<!fir.array<?xi32>>>
-!CHECK:           omp.single copyprivate(%[[A_REF]] -> @_copy_box_Uxi32 : !fir.ref<!fir.box<!fir.array<?xi32>>>, %[[I1]]#0 -> @_copy_10xi32 : !fir.ref<!fir.array<10xi32>>, %[[I2]]#0 -> @_copy_3x4xi32 : !fir.ref<!fir.array<3x4xi32>>, %[[I3_REF]] -> @_copy_box_Uxi32 : !fir.ref<!fir.box<!fir.array<?xi32>>>, %[[R1]]#0 -> @_copy_10xf32 : !fir.ref<!fir.array<10xf32>>, %[[C1]]#0 -> @_copy_3x4xz32 : !fir.ref<!fir.array<3x4xcomplex<f32>>>, %[[L1]]#0 -> @_copy_10xl32 : !fir.ref<!fir.array<10x!fir.logical<4>>>, %[[S1]]#0 -> @_copy_3xc8x8 : !fir.ref<!fir.array<3x!fir.char<1,8>>>, %[[S2]]#0 -> @_copy_3xc16x5 : !fir.ref<!fir.array<3x!fir.char<2,5>>>)
+!CHECK:           %[[A:.*]]:2 = hlfir.declare %{{.*}} {uniq_name = "_QFtest_arrayEa"}
+!CHECK:           %[[I1:.*]]:2 = hlfir.declare %{{.*}} {uniq_name = "_QFtest_arrayEi1"}
+!CHECK:           %[[I2:.*]]:2 = hlfir.declare %{{.*}} {uniq_name = "_QFtest_arrayEi2"}
+!CHECK:           %[[I3:.*]]:2 = hlfir.declare %{{.*}} {uniq_name = "_QFtest_arrayEi3"}
+!CHECK:           %[[R1:.*]]:2 = hlfir.declare %{{.*}} {uniq_name = "_QFtest_arrayEr1"}
+!CHECK:           %[[C1:.*]]:2 = hlfir.declare %{{.*}} {uniq_name = "_QFtest_arrayEc1"}
+!CHECK:           %[[L1:.*]]:2 = hlfir.declare %{{.*}} {uniq_name = "_QFtest_arrayEl1"}
+!CHECK:           %[[S1:.*]]:2 = hlfir.declare {{.*}} {uniq_name = "_QFtest_arrayEs1"}
+!CHECK:           %[[S2:.*]]:2 = hlfir.declare {{.*}} {uniq_name = "_QFtest_arrayEs2"}
+!CHECK:           omp.single copyprivate(%[[A]]#0 -> @_copy_box_Uxi32 : {{.*}}, %[[I1]]#0 -> @_copy_box_10xi32 : {{.*}}, %[[I2]]#0 -> @_copy_box_3x4xi32 : {{.*}}, %[[I3]]#0 -> @_copy_box_Uxi32 : {{.*}}, %[[R1]]#0 -> @_copy_box_10xf32 : {{.*}}, %[[C1]]#0 -> @_copy_box_3x4xz32 : {{.*}}, %[[L1]]#0 -> @_copy_box_10xl32 : {{.*}}, %[[S1]]#0 -> @_copy_box_3xc8x8 : {{.*}}, %[[S2]]#0 -> @_copy_box_3xc16x5 : {{.*}})
 subroutine test_array(a, n)
   integer :: a(:), n
   integer :: i1(10), i2(3, 4), i3(n)
diff --git a/flang/test/Lower/OpenMP/delayed-privatization-pointer.f90 b/flang/test/Lower/OpenMP/delayed-privatization-pointer.f90
index 9b6aab6b55d693..1dc345c11568c4 100644
--- a/flang/test/Lower/OpenMP/delayed-privatization-pointer.f90
+++ b/flang/test/Lower/OpenMP/delayed-privatization-pointer.f90
@@ -14,19 +14,42 @@ subroutine delayed_privatization_pointer
 !$omp end parallel
 end subroutine
 
-! CHECK-LABEL: omp.private {type = firstprivate}
-! CHECK-SAME: @[[PRIVATIZER_SYM:.*]] : [[TYPE:!fir.box<!fir.ptr<i32>>]] init {
+subroutine delayed_privatization_lenparams(length)
+  integer, intent(in) :: length
+  character(length), pointer :: var
+
+  !$omp parallel firstprivate(var)
+    var = 'a'
+  !$omp end parallel
+end subroutine
 
+! CHECK-LABEL: omp.private {type = firstprivate}
+! CHECK-SAME: @[[PRIVATIZER_SYM2:.*]] : [[TYPE:!fir.box<!fir.ptr<!fir.char<1,\?>>>]] init {
 ! CHECK-NEXT: ^bb0(%[[PRIV_ARG:.*]]: !fir.ref<[[TYPE]]>, %[[PRIV_ALLOC:.*]]: !fir.ref<[[TYPE]]>):
+! CHECK-NEXT:   %[[ARG:.*]] = fir.load %[[PRIV_ARG]]
+! CHECK-NEXT:   %[[SIZE:.*]] = fir.box_elesize %[[ARG]]
+! CHECK-NEXT:   %[[NULL:.*]] = fir.zero_bits !fir.ptr<!fir.char<1,?>>
+! CHECK-NEXT:   %[[INIT:.*]] = fir.embox %[[NULL]] typeparams %[[SIZE]]
+! CHECK-NEXT:   fir.store %[[INIT]] to %[[PRIV_ALLOC]]
+! CHECK-NEXT:   omp.yield(%[[PRIV_ALLOC]] : !fir.ref<[[TYPE]]>)
+! CHECK-NEXT: } copy {
+! CHECK: ^bb0(%[[PRIV_ORIG_ARG:.*]]: !fir.ref<[[TYPE]]>, %[[PRIV_PRIV_ARG:.*]]: !fir.ref<[[TYPE]]>):
+! CHECK-NEXT:    %[[ORIG_BASE_VAL:.*]] = fir.load %[[PRIV_ORIG_ARG]]
+! CHECK-NEXT:   fir.store %[[ORIG_BASE_VAL]] to %[[PRIV_PRIV_ARG]]
+! CHECK-NEXT:   omp.yield(%[[PRIV_PRIV_ARG]] : !fir.ref<[[TYPE]]>)
+! CHECK-NEXT: }
 
+! CHECK-LABEL: omp.private {type = firstprivate}
+! CHECK-SAME: @[[PRIVATIZER_SYM:.*]] : [[TYPE:!fir.box<!fir.ptr<i32>>]] init {
+! CHECK-NEXT: ^bb0(%[[PRIV_ARG:.*]]: !fir.ref<[[TYPE]]>, %[[PRIV_ALLOC:.*]]: !fir.ref<[[TYPE]]>):
+! CHECK-NEXT:   %[[ARG:.*]] = fir.load %[[PRIV_ARG]]
 ! CHECK-NEXT:   %[[NULL:.*]] = fir.zero_bits !fir.ptr<i32>
 ! CHECK-NEXT:   %[[INIT:.*]] = fir.embox %[[NULL]] : (!fir.ptr<i32>) -> !fir.box<!fir.ptr<i32>>
 ! CHECK-NEXT:   fir.store %[[INIT]] to %[[PRIV_ALLOC]] : !fir.ref<!fir.box<!fir.ptr<i32>>>
 ! CHECK-NEXT:   omp.yield(%[[PRIV_ALLOC]] : !fir.ref<[[TYPE]]>)
-
 ! CHECK-NEXT: } copy {
 ! CHECK: ^bb0(%[[PRIV_ORIG_ARG:.*]]: !fir.ref<[[TYPE]]>, %[[PRIV_PRIV_ARG:.*]]: !fir.ref<[[TYPE]]>):
 ! CHECK-NEXT:    %[[ORIG_BASE_VAL:.*]] = fir.load %[[PRIV_ORIG_ARG]]
- ! CHECK-NEXT:   fir.store %[[ORIG_BASE_VAL]] to %[[PRIV_PRIV_ARG]] : !fir.ref<!fir.box<!fir.ptr<i32>>>
+! CHECK-NEXT:   fir.store %[[ORIG_BASE_VAL]] to %[[PRIV_PRIV_ARG]] : !fir.ref<!fir.box<!fir.ptr<i32>>>
 ! CHECK-NEXT:   omp.yield(%[[PRIV_PRIV_ARG]] : !fir.ref<[[TYPE]]>)
 ! CHECK-NEXT: }

>From 7cb83e335cfd25ab750e8c15c951dfb9198ac7a1 Mon Sep 17 00:00:00 2001
From: Tom Eccles <tom.eccles at arm.com>
Date: Thu, 23 Jan 2025 15:07:01 +0000
Subject: [PATCH 12/12] Readability improvments

---
 .../lib/Lower/OpenMP/DataSharingProcessor.cpp | 24 +++++++++++--------
 flang/lib/Lower/OpenMP/DataSharingProcessor.h |  2 +-
 .../Lower/OpenMP/PrivateReductionUtils.cpp    | 20 ++++++----------
 .../lib/Lower/OpenMP/PrivateReductionUtils.h  |  4 ++--
 .../OpenMP/MapsForPrivatizedSymbols.cpp       |  2 +-
 mlir/include/mlir/Dialect/OpenMP/OpenMPOps.td | 10 ++++++--
 6 files changed, 33 insertions(+), 29 deletions(-)

diff --git a/flang/lib/Lower/OpenMP/DataSharingProcessor.cpp b/flang/lib/Lower/OpenMP/DataSharingProcessor.cpp
index 8b70f5a5fde1c2..be28408428d4ae 100644
--- a/flang/lib/Lower/OpenMP/DataSharingProcessor.cpp
+++ b/flang/lib/Lower/OpenMP/DataSharingProcessor.cpp
@@ -168,7 +168,7 @@ void DataSharingProcessor::cloneSymbol(const semantics::Symbol *sym) {
 
   if (needInitClone()) {
     Fortran::lower::initializeCloneAtRuntime(converter, *sym, symTable);
-    mightHaveReadMoldArg = true;
+    mightHaveReadHostSym = true;
   }
 }
 
@@ -221,7 +221,7 @@ bool DataSharingProcessor::needBarrier() {
   for (const semantics::Symbol *sym : allPrivatizedSymbols) {
     if (sym->test(semantics::Symbol::Flag::OmpLastPrivate) &&
         (sym->test(semantics::Symbol::Flag::OmpFirstPrivate) ||
-         mightHaveReadMoldArg))
+         mightHaveReadHostSym))
       return true;
   }
   return false;
@@ -505,17 +505,15 @@ void DataSharingProcessor::doPrivatize(const semantics::Symbol *sym,
   lower::SymbolBox hsb = converter.lookupOneLevelUpSymbol(*sym);
   assert(hsb && "Host symbol box not found");
 
-  mlir::Value privVal = hsb.getAddr();
-  mlir::Type allocType;
-  if (mlir::isa<fir::PointerType>(privVal.getType()))
-    allocType = privVal.getType();
-  else
-    allocType = fir::unwrapRefType(privVal.getType());
-
   mlir::Location symLoc = hsb.getAddr().getLoc();
   std::string privatizerName = sym->name().ToString() + ".privatizer";
   bool isFirstPrivate = sym->test(semantics::Symbol::Flag::OmpFirstPrivate);
 
+  mlir::Value privVal = hsb.getAddr();
+  mlir::Type allocType = privVal.getType();
+  if (!mlir::isa<fir::PointerType>(privVal.getType()))
+    allocType = fir::unwrapRefType(privVal.getType());
+
   if (auto poly = mlir::dyn_cast<fir::ClassType>(allocType)) {
     if (!mlir::isa<fir::PointerType>(poly.getEleTy()) && isFirstPrivate)
       TODO(symLoc, "create polymorphic host associated copy");
@@ -566,6 +564,12 @@ void DataSharingProcessor::doPrivatize(const semantics::Symbol *sym,
     lower::SymMapScope outerScope(symTable);
 
     // Populate the `init` region.
+    // We need to initialize in the following cases:
+    // 1. The allocation was for a derived type which requires initialization
+    //    (this can be skipped if it will be initialized anyway by the copy
+    //    region, unless the derived type has allocatable components)
+    // 2. The allocation was for any kind of box
+    // 3. The allocation was for a boxed character
     const bool needsInitialization =
         (Fortran::lower::hasDefaultInitialization(sym->GetUltimate()) &&
          (!isFirstPrivate || hlfir::mayHaveAllocatableComponent(allocType))) ||
@@ -586,7 +590,7 @@ void DataSharingProcessor::doPrivatize(const semantics::Symbol *sym,
       // TODO: currently there are false positives from dead uses of the mold
       // arg
       if (!result.getInitMoldArg().getUses().empty())
-        mightHaveReadMoldArg = true;
+        mightHaveReadHostSym = true;
     }
 
     // Populate the `copy` region if this is a `firstprivate`.
diff --git a/flang/lib/Lower/OpenMP/DataSharingProcessor.h b/flang/lib/Lower/OpenMP/DataSharingProcessor.h
index 51f42f01b46119..8e15c6d260389b 100644
--- a/flang/lib/Lower/OpenMP/DataSharingProcessor.h
+++ b/flang/lib/Lower/OpenMP/DataSharingProcessor.h
@@ -86,7 +86,7 @@ class DataSharingProcessor {
   lower::pft::Evaluation &eval;
   bool shouldCollectPreDeterminedSymbols;
   bool useDelayedPrivatization;
-  bool mightHaveReadMoldArg = false;
+  bool mightHaveReadHostSym = false;
   lower::SymMap &symTable;
   OMPConstructSymbolVisitor visitor;
 
diff --git a/flang/lib/Lower/OpenMP/PrivateReductionUtils.cpp b/flang/lib/Lower/OpenMP/PrivateReductionUtils.cpp
index 708d92be6fdb0e..321d3d0e437537 100644
--- a/flang/lib/Lower/OpenMP/PrivateReductionUtils.cpp
+++ b/flang/lib/Lower/OpenMP/PrivateReductionUtils.cpp
@@ -160,6 +160,7 @@ static void initializeIfDerivedTypeBox(fir::FirOpBuilder &builder,
                                        mlir::Location loc, mlir::Value newBox,
                                        mlir::Value moldBox, bool hasInitializer,
                                        bool isFirstPrivate) {
+  assert(moldBox.getType() == newBox.getType());
   fir::BoxType boxTy = mlir::dyn_cast<fir::BoxType>(newBox.getType());
   fir::ClassType classTy = mlir::dyn_cast<fir::ClassType>(newBox.getType());
   if (!boxTy && !classTy)
@@ -177,7 +178,6 @@ static void initializeIfDerivedTypeBox(fir::FirOpBuilder &builder,
 
   if (!fir::isa_derived(derivedTy))
     return;
-  assert(moldBox.getType() == newBox.getType());
 
   if (hasInitializer)
     fir::runtime::genDerivedTypeInitialize(builder, loc, newBox);
@@ -218,17 +218,16 @@ isDerivedTypeNeedingInitialization(const Fortran::semantics::Symbol &sym) {
   if (const Fortran::semantics::DeclTypeSpec *declTypeSpec = sym.GetType())
     if (const Fortran::semantics::DerivedTypeSpec *derivedTypeSpec =
             declTypeSpec->AsDerived())
-      if (derivedTypeSpec->HasDefaultInitialization(
-              /*ignoreAllocatable=*/false, /*ignorePointer=*/true))
-        return true;
+      return derivedTypeSpec->HasDefaultInitialization(
+          /*ignoreAllocatable=*/false, /*ignorePointer=*/true);
   return false;
 }
 
 static mlir::Value generateZeroShapeForRank(fir::FirOpBuilder &builder,
                                             mlir::Location loc,
                                             mlir::Value moldArg) {
-  mlir::Type moldVal = fir::unwrapRefType(moldArg.getType());
-  mlir::Type eleType = fir::dyn_cast_ptrOrBoxEleTy(moldVal);
+  mlir::Type moldType = fir::unwrapRefType(moldArg.getType());
+  mlir::Type eleType = fir::dyn_cast_ptrOrBoxEleTy(moldType);
   fir::SequenceType seqTy =
       mlir::dyn_cast_if_present<fir::SequenceType>(eleType);
   if (!seqTy)
@@ -324,13 +323,8 @@ void Fortran::lower::omp::populateByRefInitAndCleanupRegions(
       // Just incase, do initialize the box with a null value
       mlir::Value null = builder.createNullConstant(loc, boxTy.getEleTy());
       mlir::Value nullBox;
-      if (shape)
-        nullBox = builder.create<fir::EmboxOp>(
-            loc, boxTy, null, shape, /*slice=*/mlir::Value{}, lenParams);
-      else
-        nullBox = builder.create<fir::EmboxOp>(
-            loc, boxTy, null, /*shape=*/mlir::Value{}, /*slice=*/mlir::Value{},
-            lenParams);
+      nullBox = builder.create<fir::EmboxOp>(
+          loc, boxTy, null, shape, /*slice=*/mlir::Value{}, lenParams);
       builder.create<fir::StoreOp>(loc, nullBox, boxAlloca);
       yield(boxAlloca);
       return;
diff --git a/flang/lib/Lower/OpenMP/PrivateReductionUtils.h b/flang/lib/Lower/OpenMP/PrivateReductionUtils.h
index 7b7adc09c835b2..fcd36392a29e0a 100644
--- a/flang/lib/Lower/OpenMP/PrivateReductionUtils.h
+++ b/flang/lib/Lower/OpenMP/PrivateReductionUtils.h
@@ -48,8 +48,8 @@ inline bool isReduction(DeclOperationKind kind) {
 
 /// Generate init and cleanup regions suitable for reduction or privatizer
 /// declarations. `scalarInitValue` may be nullptr if there is no default
-/// initialization (for privatization). If this is for a privatizer, set
-/// `isPrivate` to `true`.
+/// initialization (for privatization). `kind` should be set to indicate
+/// what kind of operation definition this initialization belongs to.
 void populateByRefInitAndCleanupRegions(
     AbstractConverter &converter, mlir::Location loc, mlir::Type argType,
     mlir::Value scalarInitValue, mlir::Block *initBlock,
diff --git a/flang/lib/Optimizer/OpenMP/MapsForPrivatizedSymbols.cpp b/flang/lib/Optimizer/OpenMP/MapsForPrivatizedSymbols.cpp
index 5d44dcd042899f..8935b3967e56b6 100644
--- a/flang/lib/Optimizer/OpenMP/MapsForPrivatizedSymbols.cpp
+++ b/flang/lib/Optimizer/OpenMP/MapsForPrivatizedSymbols.cpp
@@ -65,7 +65,7 @@ class MapsForPrivatizedSymbolsPass
     // decalred needs a descriptor.
     // Some types are boxed immediately before privatization. These have other
     // operations in between the privatization and the declaration. It is safe
-    // to use var directly here because they will be boxed anyay.
+    // to use var directly here because they will be boxed anyway.
     if (auto declOp = llvm::dyn_cast<hlfir::DeclareOp>(definingOp))
       varPtr = declOp.getBase();
 
diff --git a/mlir/include/mlir/Dialect/OpenMP/OpenMPOps.td b/mlir/include/mlir/Dialect/OpenMP/OpenMPOps.td
index a3a02124ec16bd..1ea115de192d71 100644
--- a/mlir/include/mlir/Dialect/OpenMP/OpenMPOps.td
+++ b/mlir/include/mlir/Dialect/OpenMP/OpenMPOps.td
@@ -64,11 +64,17 @@ def PrivateClauseOp : OpenMP_Op<"private", [IsolatedFromAbove, RecipeInterface]>
     ```mlir
     omp.private {type = private} @x.privatizer : !some.type init {
     ^bb0(%arg0: !some.pointer<!some.type>, %arg1: !some.pointer<!some.type>):
-    // initialize %arg1, using %arg0 as a mold for allocations
+    // initialize %arg1, using %arg0 as a mold for allocations.
+    // For example if %arg0 is a heap allocated array with a runtime determined
+    // length and !some.type is a runtime type descriptor, the init region
+    // will read the array length from %arg0, and heap allocate an array of the
+    // right length and initialize %arg1 to contain the array allocation and
+    // length.
     omp.yield(%arg1 : !some.pointer<!some.type>)
     } dealloc {
     ^bb0(%arg0: !some.pointer<!some.type>):
-    ... deallocate allocated memory ...
+    // ... deallocate memory allocated by the init region...
+    // In the example above, this will free the heap allocated array data.
     omp.yield
     }
     ```



More information about the llvm-branch-commits mailing list