[flang-commits] [flang] ab1db26 - [flang][hlfir] Fixed some finalization/deallocation issues. (#67047)

via flang-commits flang-commits at lists.llvm.org
Fri Sep 22 10:47:58 PDT 2023


Author: Slava Zakharin
Date: 2023-09-22T10:47:53-07:00
New Revision: ab1db262724ecbb87a3957a60faaca6974ea731f

URL: https://github.com/llvm/llvm-project/commit/ab1db262724ecbb87a3957a60faaca6974ea731f
DIFF: https://github.com/llvm/llvm-project/commit/ab1db262724ecbb87a3957a60faaca6974ea731f.diff

LOG: [flang][hlfir] Fixed some finalization/deallocation issues. (#67047)

This set of commits resolves some of the issues with elemental calls producing
results that may require finalization, and also some memory leak issues due to
the missing deallocation of allocatable components of the temporary buffers
created by the bufferization pass.

- [flang][runtime] Expose Finalize API for derived types.

- [flang][hlfir] Add 'finalize' attribute for DestroyOp.

- [flang][hlfir] Postpone result finalization for elemental calls.

    The results of elemental calls generated inside hlfir.elemental must not
    be finalized/destructed before they are copied into the resulting
    array. The finalization must be done on the array as a whole
    (e.g. there might be different scalar and array finalization routines).
    The finalization work is left to the hlfir.destroy corresponding
    to this hlfir.elemental.

- [flang][hlfir] Tighten requirements on hlfir.end_associate operand.

    If component deallocation might be required for the operand of
    hlfir.end_associate, we have to be able to get the variable
    shape/params to create a descriptor for calling the runtime.
    This commit adds verification that we can do so.

- [flang][hlfir] Lower argument clean-ups using valid hlfir.end_associate.

    The operand must be a Fortran entity, when allocatable component
    deallocation may be required.

- [flang][hlfir] Properly clean-up temporary buffers in bufferization pass.

    This commit combines changes for proper finalization and component
    deallocation of the temporary buffers. The finalization part
    relates to hlfir.destroy operations with 'finalize' attribute.
    The component deallocation might be invoked for both hlfir.destroy
    and hlfir.end_associate, if the operand is of a derived type
    with allocatable component(s).

The changes are mostly in one function, so I decided not to split them.

- [flang][hlfir] Disable optimizations for hlfir.elemental requiring finalization.

    If hlfir.elemental is coupled with hlfir.destroy with 'finalize' attribute,
    the temporary array result of hlfir.elemental needs to be created
    for the purpose of finalization. We cannot do certain optimizations
    on such hlfir.elemental operations.

    I was not able to come up with a test for the OptimizedBufferization pass,
    but I put the check there as well.

Added: 
    flang/test/HLFIR/bufferize-destroy-for-derived.fir
    flang/test/HLFIR/bufferize-end-associate-for-derived.fir
    flang/test/Lower/HLFIR/associate-for-args-with-alloc-components.f90
    flang/test/Lower/HLFIR/elemental-call-with-finalization.f90

Modified: 
    flang/include/flang/Lower/ConvertCall.h
    flang/include/flang/Optimizer/Builder/HLFIRTools.h
    flang/include/flang/Optimizer/Builder/Runtime/Derived.h
    flang/include/flang/Optimizer/HLFIR/HLFIRDialect.h
    flang/include/flang/Optimizer/HLFIR/HLFIROps.td
    flang/include/flang/Runtime/derived-api.h
    flang/lib/Lower/ConvertCall.cpp
    flang/lib/Lower/HlfirIntrinsics.cpp
    flang/lib/Optimizer/Builder/HLFIRTools.cpp
    flang/lib/Optimizer/Builder/Runtime/Derived.cpp
    flang/lib/Optimizer/HLFIR/IR/HLFIRDialect.cpp
    flang/lib/Optimizer/HLFIR/IR/HLFIROps.cpp
    flang/lib/Optimizer/HLFIR/Transforms/BufferizeHLFIR.cpp
    flang/lib/Optimizer/HLFIR/Transforms/InlineElementals.cpp
    flang/lib/Optimizer/HLFIR/Transforms/OptimizedBufferization.cpp
    flang/runtime/derived-api.cpp
    flang/test/HLFIR/associate-codegen.fir
    flang/test/HLFIR/bufferize-poly-expr.fir
    flang/test/HLFIR/destroy.fir
    flang/test/HLFIR/elemental-codegen-nested.fir
    flang/test/HLFIR/inline-elemental.fir
    flang/test/HLFIR/invalid.fir
    flang/test/HLFIR/mul_transpose.f90
    flang/test/Lower/HLFIR/poly_expr_for_nonpoly_dummy.f90
    flang/test/Lower/HLFIR/polymorphic-expressions.f90

Removed: 
    


################################################################################
diff  --git a/flang/include/flang/Lower/ConvertCall.h b/flang/include/flang/Lower/ConvertCall.h
index 62ba229614d58c1..f8171236bb39d99 100644
--- a/flang/include/flang/Lower/ConvertCall.h
+++ b/flang/include/flang/Lower/ConvertCall.h
@@ -28,11 +28,13 @@ namespace Fortran::lower {
 /// the call and return the result. This function deals with explicit result
 /// allocation and lowering if needed. It also deals with passing the host
 /// link to internal procedures.
+/// \p isElemental must be set to true if elemental call is being produced.
+/// It is only used for HLFIR.
 fir::ExtendedValue genCallOpAndResult(
     mlir::Location loc, Fortran::lower::AbstractConverter &converter,
     Fortran::lower::SymMap &symMap, Fortran::lower::StatementContext &stmtCtx,
     Fortran::lower::CallerInterface &caller, mlir::FunctionType callSiteType,
-    std::optional<mlir::Type> resultType);
+    std::optional<mlir::Type> resultType, bool isElemental = false);
 
 /// If \p arg is the address of a function with a denoted host-association tuple
 /// argument, then return the host-associations tuple value of the current

diff  --git a/flang/include/flang/Optimizer/Builder/HLFIRTools.h b/flang/include/flang/Optimizer/Builder/HLFIRTools.h
index 6d73ebc3a7e1d9c..393e70f772e5ecf 100644
--- a/flang/include/flang/Optimizer/Builder/HLFIRTools.h
+++ b/flang/include/flang/Optimizer/Builder/HLFIRTools.h
@@ -35,37 +35,6 @@ class ElementalOpInterface;
 class ElementalAddrOp;
 class YieldElementOp;
 
-/// Is this an SSA value type for the value of a Fortran procedure
-/// designator ?
-inline bool isFortranProcedureValue(mlir::Type type) {
-  return type.isa<fir::BoxProcType>() ||
-         (type.isa<mlir::TupleType>() &&
-          fir::isCharacterProcedureTuple(type, /*acceptRawFunc=*/false));
-}
-
-/// Is this an SSA value type for the value of a Fortran expression?
-inline bool isFortranValueType(mlir::Type type) {
-  return type.isa<hlfir::ExprType>() || fir::isa_trivial(type) ||
-         isFortranProcedureValue(type);
-}
-
-/// Is this the value of a Fortran expression in an SSA value form?
-inline bool isFortranValue(mlir::Value value) {
-  return isFortranValueType(value.getType());
-}
-
-/// Is this a Fortran variable?
-/// Note that by "variable", it must be understood that the mlir::Value is
-/// a memory value of a storage that can be reason about as a Fortran object
-/// (its bounds, shape, and type parameters, if any, are retrievable).
-/// This does not imply that the mlir::Value points to a variable from the
-/// original source or can be legally defined: temporaries created to store
-/// expression values are considered to be variables, and so are PARAMETERs
-/// global constant address.
-inline bool isFortranEntity(mlir::Value value) {
-  return isFortranValue(value) || isFortranVariableType(value.getType());
-}
-
 /// Is this a Fortran variable for which the defining op carrying the Fortran
 /// attributes is visible?
 inline bool isFortranVariableWithAttributes(mlir::Value value) {
@@ -442,6 +411,13 @@ hlfir::ElementalOp cloneToElementalOp(mlir::Location loc,
                                       fir::FirOpBuilder &builder,
                                       hlfir::ElementalAddrOp elementalAddrOp);
 
+/// Return true, if \p elemental must produce a temporary array,
+/// for example, for the purpose of finalization. Note that such
+/// ElementalOp's must be optimized with caution. For example,
+/// completely inlining such ElementalOp into another one
+/// would be incorrect.
+bool elementalOpMustProduceTemp(hlfir::ElementalOp elemental);
+
 } // namespace hlfir
 
 #endif // FORTRAN_OPTIMIZER_BUILDER_HLFIRTOOLS_H

diff  --git a/flang/include/flang/Optimizer/Builder/Runtime/Derived.h b/flang/include/flang/Optimizer/Builder/Runtime/Derived.h
index 30998f8b0ea6558..d8b06f35b1da8a6 100644
--- a/flang/include/flang/Optimizer/Builder/Runtime/Derived.h
+++ b/flang/include/flang/Optimizer/Builder/Runtime/Derived.h
@@ -31,6 +31,11 @@ void genDerivedTypeInitialize(fir::FirOpBuilder &builder, mlir::Location loc,
 void genDerivedTypeDestroy(fir::FirOpBuilder &builder, mlir::Location loc,
                            mlir::Value box);
 
+/// Generate call to derived type finalization runtime routine
+/// to finalize \p box.
+void genDerivedTypeFinalize(fir::FirOpBuilder &builder, mlir::Location loc,
+                            mlir::Value box);
+
 /// Generate call to derived type destruction runtime routine to
 /// destroy \p box without finalization
 void genDerivedTypeDestroyWithoutFinalization(fir::FirOpBuilder &builder,

diff  --git a/flang/include/flang/Optimizer/HLFIR/HLFIRDialect.h b/flang/include/flang/Optimizer/HLFIR/HLFIRDialect.h
index b76063fb7c53536..aa68d0811c4868a 100644
--- a/flang/include/flang/Optimizer/HLFIR/HLFIRDialect.h
+++ b/flang/include/flang/Optimizer/HLFIR/HLFIRDialect.h
@@ -78,6 +78,37 @@ inline bool isPolymorphicType(mlir::Type type) {
   return fir::isPolymorphicType(type);
 }
 
+/// Is this an SSA value type for the value of a Fortran procedure
+/// designator ?
+inline bool isFortranProcedureValue(mlir::Type type) {
+  return type.isa<fir::BoxProcType>() ||
+         (type.isa<mlir::TupleType>() &&
+          fir::isCharacterProcedureTuple(type, /*acceptRawFunc=*/false));
+}
+
+/// Is this an SSA value type for the value of a Fortran expression?
+inline bool isFortranValueType(mlir::Type type) {
+  return type.isa<hlfir::ExprType>() || fir::isa_trivial(type) ||
+         isFortranProcedureValue(type);
+}
+
+/// Is this the value of a Fortran expression in an SSA value form?
+inline bool isFortranValue(mlir::Value value) {
+  return isFortranValueType(value.getType());
+}
+
+/// Is this a Fortran variable?
+/// Note that by "variable", it must be understood that the mlir::Value is
+/// a memory value of a storage that can be reason about as a Fortran object
+/// (its bounds, shape, and type parameters, if any, are retrievable).
+/// This does not imply that the mlir::Value points to a variable from the
+/// original source or can be legally defined: temporaries created to store
+/// expression values are considered to be variables, and so are PARAMETERs
+/// global constant address.
+inline bool isFortranEntity(mlir::Value value) {
+  return isFortranValue(value) || isFortranVariableType(value.getType());
+}
+
 bool isFortranScalarNumericalType(mlir::Type);
 bool isFortranNumericalArrayObject(mlir::Type);
 bool isFortranNumericalOrLogicalArrayObject(mlir::Type);
@@ -94,6 +125,13 @@ bool isPolymorphicObject(mlir::Type);
 mlir::Value genExprShape(mlir::OpBuilder &builder, const mlir::Location &loc,
                          const hlfir::ExprType &expr);
 
+/// Return true iff `ty` may have allocatable component.
+/// TODO: this actually belongs to FIRType.cpp, but the method's implementation
+/// depends on HLFIRDialect component. FIRType.cpp itself is part of FIRDialect
+/// that cannot depend on HLFIRBuilder (there will be a cyclic dependency).
+/// This has to be cleaned up, when HLFIR is the default.
+bool mayHaveAllocatableComponent(mlir::Type ty);
+
 } // namespace hlfir
 
 #endif // FORTRAN_OPTIMIZER_HLFIR_HLFIRDIALECT_H

diff  --git a/flang/include/flang/Optimizer/HLFIR/HLFIROps.td b/flang/include/flang/Optimizer/HLFIR/HLFIROps.td
index 1f584d6afd8fbcb..9fbf09331c099c0 100644
--- a/flang/include/flang/Optimizer/HLFIR/HLFIROps.td
+++ b/flang/include/flang/Optimizer/HLFIR/HLFIROps.td
@@ -705,6 +705,8 @@ def hlfir_EndAssociateOp : hlfir_Op<"end_associate", [MemoryEffects<[MemFree]>]>
 
   let description = [{
     Mark the end of life of a variable associated to an expression.
+    If the expression has a derived type that may contain allocatable
+    components, the variable operand must be a Fortran entity.
   }];
 
   let arguments = (ins AnyRefOrBoxLike:$var,
@@ -715,6 +717,7 @@ def hlfir_EndAssociateOp : hlfir_Op<"end_associate", [MemoryEffects<[MemFree]>]>
   }];
 
   let builders = [OpBuilder<(ins "hlfir::AssociateOp":$associate)>];
+  let hasVerifier = 1;
 }
 
 def hlfir_AsExprOp : hlfir_Op<"as_expr",
@@ -981,6 +984,11 @@ def hlfir_DestroyOp : hlfir_Op<"destroy", [MemoryEffects<[MemFree]>]> {
     Mark the last use of an hlfir.expr. This will be the point at which the
     buffer of an hlfir.expr, if any, will be deallocated if it was heap
     allocated.
+    If "finalize" attribute is set, the hlfir.expr value will be finalized
+    before the deallocation. Note that this implies that the hlfir.expr
+    is placed into a memory buffer, so that the library runtime
+    can be called on it. The element type of the hlfir.expr must be
+    derived type in this case.
     It is not required to create an hlfir.destroy operation for and hlfir.expr
     created inside an hlfir.elemental and returned in the hlfir.yield_element.
     The last use of such expression is implicit and an hlfir.destroy could
@@ -995,9 +1003,22 @@ def hlfir_DestroyOp : hlfir_Op<"destroy", [MemoryEffects<[MemFree]>]> {
     in bufferization instead.
   }];
 
-  let arguments = (ins hlfir_ExprType:$expr);
+  let arguments = (ins
+    hlfir_ExprType:$expr,
+    UnitAttr:$finalize
+  );
+
+  let assemblyFormat = [{
+    $expr (`finalize` $finalize^)? attr-dict `:` qualified(type($expr))
+  }];
+
+  let extraClassDeclaration = [{
+    bool mustFinalizeExpr() {
+      return getFinalize();
+    }
+  }];
 
-  let assemblyFormat = "$expr attr-dict `:` qualified(type($expr))";
+  let hasVerifier = 1;
 }
 
 def hlfir_CopyInOp : hlfir_Op<"copy_in", [MemoryEffects<[MemAlloc]>]> {

diff  --git a/flang/include/flang/Runtime/derived-api.h b/flang/include/flang/Runtime/derived-api.h
index 3bf631bc4d0c763..decba9f686d9201 100644
--- a/flang/include/flang/Runtime/derived-api.h
+++ b/flang/include/flang/Runtime/derived-api.h
@@ -37,6 +37,10 @@ void RTNAME(Initialize)(
 // storage.
 void RTNAME(Destroy)(const Descriptor &);
 
+// Finalizes the object and its components.
+void RTNAME(Finalize)(
+    const Descriptor &, const char *sourceFile = nullptr, int sourceLine = 0);
+
 /// Deallocates any allocatable/automatic components.
 /// Does not deallocate the descriptor's storage.
 /// Does not perform any finalization.

diff  --git a/flang/lib/Lower/ConvertCall.cpp b/flang/lib/Lower/ConvertCall.cpp
index 59d059e27cf1aae..788aac2d583f93d 100644
--- a/flang/lib/Lower/ConvertCall.cpp
+++ b/flang/lib/Lower/ConvertCall.cpp
@@ -148,7 +148,7 @@ fir::ExtendedValue Fortran::lower::genCallOpAndResult(
     mlir::Location loc, Fortran::lower::AbstractConverter &converter,
     Fortran::lower::SymMap &symMap, Fortran::lower::StatementContext &stmtCtx,
     Fortran::lower::CallerInterface &caller, mlir::FunctionType callSiteType,
-    std::optional<mlir::Type> resultType) {
+    std::optional<mlir::Type> resultType, bool isElemental) {
   fir::FirOpBuilder &builder = converter.getFirOpBuilder();
   using PassBy = Fortran::lower::CallerInterface::PassEntityBy;
   // Handle cases where caller must allocate the result or a fir.box for it.
@@ -435,7 +435,13 @@ fir::ExtendedValue Fortran::lower::genCallOpAndResult(
     std::optional<Fortran::evaluate::DynamicType> retTy =
         caller.getCallDescription().proc().GetType();
     bool cleanupWithDestroy = false;
-    if (!fir::isPointerType(funcType.getResults()[0]) && retTy &&
+    // With HLFIR lowering, isElemental must be set to true
+    // if we are producing an elemental call. In this case,
+    // the elemental results must not be destroyed, instead,
+    // the resulting array result will be finalized/destroyed
+    // as needed by hlfir.destroy.
+    if (!isElemental && !fir::isPointerType(funcType.getResults()[0]) &&
+        retTy &&
         (retTy->category() == Fortran::common::TypeCategory::Derived ||
          retTy->IsPolymorphic() || retTy->IsUnlimitedPolymorphic())) {
       if (retTy->IsPolymorphic() || retTy->IsUnlimitedPolymorphic()) {
@@ -692,6 +698,14 @@ struct PreparedDummyArgument {
     cleanups.emplace_back(
         CallCleanUp{CallCleanUp::ExprAssociate{tempVar, wasCopied}});
   }
+  void pushExprAssociateCleanUp(hlfir::AssociateOp associate) {
+    mlir::Value hlfirBase = associate.getBase();
+    mlir::Value firBase = associate.getFirBase();
+    cleanups.emplace_back(CallCleanUp{CallCleanUp::ExprAssociate{
+        hlfir::mayHaveAllocatableComponent(hlfirBase.getType()) ? hlfirBase
+                                                                : firBase,
+        associate.getMustFreeStrorageFlag()}});
+  }
 
   mlir::Value dummy;
   // NOTE: the clean-ups are executed in reverse order.
@@ -896,8 +910,7 @@ static PreparedDummyArgument preparePresentUserCallActualArgument(
           loc, builder, hlfir::Entity{copy}, storageType, "adapt.valuebyref");
       entity = hlfir::Entity{associate.getBase()};
       // Register the temporary destruction after the call.
-      preparedDummy.pushExprAssociateCleanUp(
-          associate.getFirBase(), associate.getMustFreeStrorageFlag());
+      preparedDummy.pushExprAssociateCleanUp(associate);
     } else if (mustDoCopyInOut) {
       // Copy-in non contiguous variables.
       assert(entity.getType().isa<fir::BaseBoxType>() &&
@@ -924,8 +937,7 @@ static PreparedDummyArgument preparePresentUserCallActualArgument(
     hlfir::AssociateOp associate = hlfir::genAssociateExpr(
         loc, builder, entity, storageType, "adapt.valuebyref");
     entity = hlfir::Entity{associate.getBase()};
-    preparedDummy.pushExprAssociateCleanUp(associate.getFirBase(),
-                                           associate.getMustFreeStrorageFlag());
+    preparedDummy.pushExprAssociateCleanUp(associate);
     if (mustSetDynamicTypeToDummyType) {
       // Rebox the actual argument to the dummy argument's type, and make
       // sure that we pass a contiguous entity (i.e. make copy-in,
@@ -1201,7 +1213,8 @@ genUserCall(Fortran::lower::PreparedActualArguments &loweredActuals,
   // arguments.
   fir::ExtendedValue result = Fortran::lower::genCallOpAndResult(
       loc, callContext.converter, callContext.symMap, callContext.stmtCtx,
-      caller, callSiteType, callContext.resultType);
+      caller, callSiteType, callContext.resultType,
+      callContext.isElementalProcWithArrayArgs());
 
   /// Clean-up associations and copy-in.
   for (auto cleanUp : callCleanUps)
@@ -1687,9 +1700,14 @@ class ElementalCallBuilder {
     mlir::Value elemental =
         hlfir::genElementalOp(loc, builder, elementType, shape, typeParams,
                               genKernel, !mustBeOrdered, polymorphicMold);
+    // If the function result requires finalization, then it has to be done
+    // for the array result of the elemental call. We have to communicate
+    // this via the DestroyOp's attribute.
+    bool mustFinalizeExpr = impl().resultMayRequireFinalization(callContext);
     fir::FirOpBuilder *bldr = &builder;
-    callContext.stmtCtx.attachCleanup(
-        [=]() { bldr->create<hlfir::DestroyOp>(loc, elemental); });
+    callContext.stmtCtx.attachCleanup([=]() {
+      bldr->create<hlfir::DestroyOp>(loc, elemental, mustFinalizeExpr);
+    });
     return hlfir::EntityWithAttributes{elemental};
   }
 
@@ -1743,6 +1761,26 @@ class ElementalUserCallBuilder
     return {};
   }
 
+  bool resultMayRequireFinalization(CallContext &callContext) const {
+    std::optional<Fortran::evaluate::DynamicType> retTy =
+        caller.getCallDescription().proc().GetType();
+    if (!retTy)
+      return false;
+
+    if (retTy->IsPolymorphic() || retTy->IsUnlimitedPolymorphic())
+      fir::emitFatalError(
+          callContext.loc,
+          "elemental function call with [unlimited-]polymorphic result");
+
+    if (retTy->category() == Fortran::common::TypeCategory::Derived) {
+      const Fortran::semantics::DerivedTypeSpec &typeSpec =
+          retTy->GetDerivedTypeSpec();
+      return Fortran::semantics::IsFinalizable(typeSpec);
+    }
+
+    return false;
+  }
+
 private:
   Fortran::lower::CallerInterface &caller;
   mlir::FunctionType callSiteType;
@@ -1804,6 +1842,14 @@ class ElementalIntrinsicCallBuilder
     return {};
   }
 
+  bool resultMayRequireFinalization(
+      [[maybe_unused]] CallContext &callContext) const {
+    // FIXME: need access to the CallerInterface's return type
+    // to check if the result may need finalization (e.g. the result
+    // of MERGE).
+    return false;
+  }
+
 private:
   const Fortran::evaluate::SpecificIntrinsic *intrinsic;
   const fir::IntrinsicArgumentLoweringRules *argLowering;

diff  --git a/flang/lib/Lower/HlfirIntrinsics.cpp b/flang/lib/Lower/HlfirIntrinsics.cpp
index 62f63e376c815df..20e570044e8d4ae 100644
--- a/flang/lib/Lower/HlfirIntrinsics.cpp
+++ b/flang/lib/Lower/HlfirIntrinsics.cpp
@@ -330,6 +330,9 @@ std::optional<hlfir::EntityWithAttributes> Fortran::lower::lowerHlfirIntrinsic(
     const Fortran::lower::PreparedActualArguments &loweredActuals,
     const fir::IntrinsicArgumentLoweringRules *argLowering,
     mlir::Type stmtResultType) {
+  // If the result is of a derived type that may need finalization,
+  // we have to use DestroyOp with 'finalize' attribute for the result
+  // of the intrinsic operation.
   if (name == "sum")
     return HlfirSumLowering{builder, loc}.lower(loweredActuals, argLowering,
                                                 stmtResultType);
@@ -348,6 +351,7 @@ std::optional<hlfir::EntityWithAttributes> Fortran::lower::lowerHlfirIntrinsic(
   if (name == "dot_product")
     return HlfirDotProductLowering{builder, loc}.lower(
         loweredActuals, argLowering, stmtResultType);
+  // FIXME: the result may need finalization.
   if (name == "transpose")
     return HlfirTransposeLowering{builder, loc}.lower(
         loweredActuals, argLowering, stmtResultType);

diff  --git a/flang/lib/Optimizer/Builder/HLFIRTools.cpp b/flang/lib/Optimizer/Builder/HLFIRTools.cpp
index dd62aa0e370122f..7034d6e893e7e9a 100644
--- a/flang/lib/Optimizer/Builder/HLFIRTools.cpp
+++ b/flang/lib/Optimizer/Builder/HLFIRTools.cpp
@@ -1021,3 +1021,12 @@ hlfir::cloneToElementalOp(mlir::Location loc, fir::FirOpBuilder &builder,
                                elementalAddrOp.getShape(), typeParams,
                                genKernel, !elementalAddrOp.isOrdered());
 }
+
+bool hlfir::elementalOpMustProduceTemp(hlfir::ElementalOp elemental) {
+  for (mlir::Operation *useOp : elemental->getUsers())
+    if (auto destroy = mlir::dyn_cast<hlfir::DestroyOp>(useOp))
+      if (destroy.mustFinalizeExpr())
+        return true;
+
+  return false;
+}

diff  --git a/flang/lib/Optimizer/Builder/Runtime/Derived.cpp b/flang/lib/Optimizer/Builder/Runtime/Derived.cpp
index 8975656ffb38026..fe7e2d157ad9a6b 100644
--- a/flang/lib/Optimizer/Builder/Runtime/Derived.cpp
+++ b/flang/lib/Optimizer/Builder/Runtime/Derived.cpp
@@ -37,6 +37,18 @@ void fir::runtime::genDerivedTypeDestroy(fir::FirOpBuilder &builder,
   builder.create<fir::CallOp>(loc, func, args);
 }
 
+void fir::runtime::genDerivedTypeFinalize(fir::FirOpBuilder &builder,
+                                          mlir::Location loc, mlir::Value box) {
+  auto func = fir::runtime::getRuntimeFunc<mkRTKey(Finalize)>(loc, builder);
+  auto fTy = func.getFunctionType();
+  auto sourceFile = fir::factory::locationToFilename(builder, loc);
+  auto sourceLine =
+      fir::factory::locationToLineNo(builder, loc, fTy.getInput(2));
+  auto args = fir::runtime::createArguments(builder, loc, fTy, box, sourceFile,
+                                            sourceLine);
+  builder.create<fir::CallOp>(loc, func, args);
+}
+
 void fir::runtime::genDerivedTypeDestroyWithoutFinalization(
     fir::FirOpBuilder &builder, mlir::Location loc, mlir::Value box) {
   auto func = fir::runtime::getRuntimeFunc<mkRTKey(DestroyWithoutFinalization)>(

diff  --git a/flang/lib/Optimizer/HLFIR/IR/HLFIRDialect.cpp b/flang/lib/Optimizer/HLFIR/IR/HLFIRDialect.cpp
index 7ca6108a31acbb6..d3a6fb305c19919 100644
--- a/flang/lib/Optimizer/HLFIR/IR/HLFIRDialect.cpp
+++ b/flang/lib/Optimizer/HLFIR/IR/HLFIRDialect.cpp
@@ -207,3 +207,8 @@ mlir::Value hlfir::genExprShape(mlir::OpBuilder &builder,
   fir::ShapeOp shape = builder.create<fir::ShapeOp>(loc, shapeTy, extents);
   return shape.getResult();
 }
+
+bool hlfir::mayHaveAllocatableComponent(mlir::Type ty) {
+  return fir::isPolymorphicType(ty) || fir::isUnlimitedPolymorphicType(ty) ||
+         fir::isRecordWithAllocatableMember(hlfir::getFortranElementType(ty));
+}

diff  --git a/flang/lib/Optimizer/HLFIR/IR/HLFIROps.cpp b/flang/lib/Optimizer/HLFIR/IR/HLFIROps.cpp
index 760f7b343bdc736..ab648b2909c33ad 100644
--- a/flang/lib/Optimizer/HLFIR/IR/HLFIROps.cpp
+++ b/flang/lib/Optimizer/HLFIR/IR/HLFIROps.cpp
@@ -1237,10 +1237,28 @@ void hlfir::AssociateOp::build(mlir::OpBuilder &builder,
 void hlfir::EndAssociateOp::build(mlir::OpBuilder &builder,
                                   mlir::OperationState &result,
                                   hlfir::AssociateOp associate) {
-  return build(builder, result, associate.getFirBase(),
+  mlir::Value hlfirBase = associate.getBase();
+  mlir::Value firBase = associate.getFirBase();
+  // If EndAssociateOp may need to initiate the deallocation
+  // of allocatable components, it has to have access to the variable
+  // definition, so we cannot use the FIR base as the operand.
+  return build(builder, result,
+               hlfir::mayHaveAllocatableComponent(hlfirBase.getType())
+                   ? hlfirBase
+                   : firBase,
                associate.getMustFreeStrorageFlag());
 }
 
+mlir::LogicalResult hlfir::EndAssociateOp::verify() {
+  mlir::Value var = getVar();
+  if (hlfir::mayHaveAllocatableComponent(var.getType()) &&
+      !hlfir::isFortranEntity(var))
+    return emitOpError("that requires components deallocation must have var "
+                       "operand that is a Fortran entity");
+
+  return mlir::success();
+}
+
 //===----------------------------------------------------------------------===//
 // AsExprOp
 //===----------------------------------------------------------------------===//
@@ -1341,6 +1359,23 @@ void hlfir::NullOp::build(mlir::OpBuilder &builder,
                fir::ReferenceType::get(builder.getNoneType()));
 }
 
+//===----------------------------------------------------------------------===//
+// DestroyOp
+//===----------------------------------------------------------------------===//
+
+mlir::LogicalResult hlfir::DestroyOp::verify() {
+  if (mustFinalizeExpr()) {
+    mlir::Value expr = getExpr();
+    hlfir::ExprType exprTy = mlir::cast<hlfir::ExprType>(expr.getType());
+    mlir::Type elemTy = hlfir::getFortranElementType(exprTy);
+    if (!mlir::isa<fir::RecordType>(elemTy))
+      return emitOpError(
+          "the element type must be finalizable, when 'finalize' is set");
+  }
+
+  return mlir::success();
+}
+
 //===----------------------------------------------------------------------===//
 // CopyInOp
 //===----------------------------------------------------------------------===//

diff  --git a/flang/lib/Optimizer/HLFIR/Transforms/BufferizeHLFIR.cpp b/flang/lib/Optimizer/HLFIR/Transforms/BufferizeHLFIR.cpp
index 6fde7cffb11a4c0..e852a1887c8bf01 100644
--- a/flang/lib/Optimizer/HLFIR/Transforms/BufferizeHLFIR.cpp
+++ b/flang/lib/Optimizer/HLFIR/Transforms/BufferizeHLFIR.cpp
@@ -17,6 +17,7 @@
 #include "flang/Optimizer/Builder/HLFIRTools.h"
 #include "flang/Optimizer/Builder/MutableBox.h"
 #include "flang/Optimizer/Builder/Runtime/Allocatable.h"
+#include "flang/Optimizer/Builder/Runtime/Derived.h"
 #include "flang/Optimizer/Builder/Todo.h"
 #include "flang/Optimizer/Dialect/FIRDialect.h"
 #include "flang/Optimizer/Dialect/FIROps.h"
@@ -439,8 +440,20 @@ static bool allOtherUsesAreSafeForAssociate(mlir::Value value,
        value.getParentRegion() != endAssociate->getParentRegion()))
     return false;
 
-  for (mlir::Operation *useOp : value.getUsers())
-    if (!mlir::isa<hlfir::DestroyOp>(useOp) && useOp != currentUse) {
+  for (mlir::Operation *useOp : value.getUsers()) {
+    // Ignore DestroyOp's that do not imply finalization.
+    // If finalization is implied, then we must delegate
+    // the finalization to the correspoding EndAssociateOp,
+    // but we currently do not; so we disable the buffer
+    // reuse in this case.
+    if (auto destroy = mlir::dyn_cast<hlfir::DestroyOp>(useOp)) {
+      if (destroy.mustFinalizeExpr())
+        return false;
+      else
+        continue;
+    }
+
+    if (useOp != currentUse) {
       // hlfir.shape_of and hlfir.get_length will not disrupt cleanup so it is
       // safe for hlfir.associate. These operations might read from the box and
       // so they need to come before the hflir.end_associate (which may
@@ -458,14 +471,18 @@ static bool allOtherUsesAreSafeForAssociate(mlir::Value value,
       }
       return false;
     }
+  }
   return true;
 }
 
 static void eraseAllUsesInDestroys(mlir::Value value,
                                    mlir::ConversionPatternRewriter &rewriter) {
   for (mlir::Operation *useOp : value.getUsers())
-    if (mlir::isa<hlfir::DestroyOp>(useOp))
-      rewriter.eraseOp(useOp);
+    if (auto destroy = mlir::dyn_cast<hlfir::DestroyOp>(useOp)) {
+      assert(!destroy.mustFinalizeExpr() &&
+             "deleting DestroyOp with finalize attribute");
+      rewriter.eraseOp(destroy);
+    }
 }
 
 struct AssociateOpConversion
@@ -592,9 +609,16 @@ struct AssociateOpConversion
   }
 };
 
-static void genFreeIfMustFree(mlir::Location loc, fir::FirOpBuilder &builder,
-                              mlir::Value var, mlir::Value mustFree) {
-  auto genFree = [&]() {
+static void genBufferDestruction(mlir::Location loc, fir::FirOpBuilder &builder,
+                                 mlir::Value var, mlir::Value mustFree,
+                                 bool mustFinalize) {
+  auto genFreeOrFinalize = [&](bool doFree, bool deallocComponents,
+                               bool doFinalize) {
+    if (!doFree && !deallocComponents && !doFinalize)
+      return;
+
+    mlir::Value addr = var;
+
     // fir::FreeMemOp operand type must be a fir::HeapType.
     mlir::Type heapType = fir::HeapType::get(
         hlfir::getFortranElementOrSequenceType(var.getType()));
@@ -608,20 +632,68 @@ static void genFreeIfMustFree(mlir::Location loc, fir::FirOpBuilder &builder,
       var = builder.create<fir::LoadOp>(loc, var);
       assert(mlir::isa<fir::ClassType>(var.getType()) &&
              fir::isAllocatableType(var.getType()));
-      var = builder.create<fir::BoxAddrOp>(loc, heapType, var);
+      addr = builder.create<fir::BoxAddrOp>(loc, heapType, var);
+      // Lowering currently does not produce DestroyOp with 'finalize'
+      // for polymorphic temporaries. It will have to do so, for example,
+      // for MERGE with polymorphic results.
+      if (mustFinalize)
+        TODO(loc, "finalizing polymorphic temporary in HLFIR");
     } else if (var.getType().isa<fir::BaseBoxType, fir::BoxCharType>()) {
-      var = builder.create<fir::BoxAddrOp>(loc, heapType, var);
-    } else if (!var.getType().isa<fir::HeapType>()) {
-      var = builder.create<fir::ConvertOp>(loc, heapType, var);
+      if (mustFinalize && !mlir::isa<fir::BaseBoxType>(var.getType()))
+        fir::emitFatalError(loc, "non-finalizable variable");
+
+      addr = builder.create<fir::BoxAddrOp>(loc, heapType, var);
+    } else {
+      if (!var.getType().isa<fir::HeapType>())
+        addr = builder.create<fir::ConvertOp>(loc, heapType, var);
+
+      if (mustFinalize || deallocComponents) {
+        // Embox the raw pointer using proper shape and type params
+        // (note that the shape might be visible via the array finalization
+        // routines).
+        if (!hlfir::isFortranEntity(var))
+          TODO(loc, "need a Fortran entity to create a box");
+
+        hlfir::Entity entity{var};
+        llvm::SmallVector<mlir::Value> lenParams;
+        hlfir::genLengthParameters(loc, builder, entity, lenParams);
+        mlir::Value shape;
+        if (entity.isArray())
+          shape = hlfir::genShape(loc, builder, entity);
+        mlir::Type boxType = fir::BoxType::get(heapType);
+        var = builder.createBox(loc, boxType, addr, shape, /*slice=*/nullptr,
+                                lenParams, /*tdesc=*/nullptr);
+      }
     }
-    builder.create<fir::FreeMemOp>(loc, var);
+
+    if (mustFinalize)
+      fir::runtime::genDerivedTypeFinalize(builder, loc, var);
+
+    // If there are allocatable components, they need to be deallocated
+    // (regardless of the mustFree and mustFinalize settings).
+    if (deallocComponents)
+      fir::runtime::genDerivedTypeDestroyWithoutFinalization(builder, loc, var);
+
+    if (doFree)
+      builder.create<fir::FreeMemOp>(loc, addr);
+  };
+  bool deallocComponents = hlfir::mayHaveAllocatableComponent(var.getType());
+
+  auto genFree = [&]() {
+    genFreeOrFinalize(/*doFree=*/true, /*deallocComponents=*/false,
+                      /*doFinalize=*/false);
   };
   if (auto cstMustFree = fir::getIntIfConstant(mustFree)) {
-    if (*cstMustFree != 0)
-      genFree();
-    // else, mustFree is false, nothing to do.
+    genFreeOrFinalize(*cstMustFree != 0 ? true : false, deallocComponents,
+                      mustFinalize);
     return;
   }
+
+  // If mustFree is dynamic, first, deallocate any allocatable
+  // components and finalize.
+  genFreeOrFinalize(/*doFree=*/false, deallocComponents,
+                    /*doFinalize=*/mustFinalize);
+  // Conditionally free the memory.
   builder.genIfThen(loc, mustFree).genThen(genFree).end();
 }
 
@@ -635,7 +707,8 @@ struct EndAssociateOpConversion
                   mlir::ConversionPatternRewriter &rewriter) const override {
     mlir::Location loc = endAssociate->getLoc();
     fir::FirOpBuilder builder(rewriter, endAssociate.getOperation());
-    genFreeIfMustFree(loc, builder, adaptor.getVar(), adaptor.getMustFree());
+    genBufferDestruction(loc, builder, adaptor.getVar(), adaptor.getMustFree(),
+                         /*mustFinalize=*/false);
     rewriter.eraseOp(endAssociate);
     return mlir::success();
   }
@@ -655,9 +728,16 @@ struct DestroyOpConversion
     if (!fir::isa_trivial(bufferizedExpr.getType())) {
       fir::FirOpBuilder builder(rewriter, destroy.getOperation());
       mlir::Value mustFree = getBufferizedExprMustFreeFlag(adaptor.getExpr());
-      mlir::Value firBase = bufferizedExpr.getFirBase();
-      genFreeIfMustFree(loc, builder, firBase, mustFree);
+      // Passing FIR base might be enough for cases when
+      // component deallocation and finalization are not required.
+      // If extra BoxAddr operations become a performance problem,
+      // we may pass both bases and let genBufferDestruction decide
+      // which one to use.
+      mlir::Value base = bufferizedExpr.getBase();
+      genBufferDestruction(loc, builder, base, mustFree,
+                           destroy.mustFinalizeExpr());
     }
+
     rewriter.eraseOp(destroy);
     return mlir::success();
   }
@@ -772,6 +852,12 @@ struct ElementalOpConversion
     // Assign the element value to the temp element for this iteration.
     auto tempElement =
         hlfir::getElementAt(loc, builder, temp, loopNest.oneBasedIndices);
+    // FIXME: if the elemental result is a function result temporary
+    // of a derived type, we have to make sure that we are either
+    // deallocate any allocatable/automatic components after the assignment
+    // or that we do not do the deep copy with the AssignOp. The latter
+    // seems to be preferrable, because the deep copy is more expensive.
+    // The shallow copy may be done with a load/store of the RecordType scalar.
     builder.create<hlfir::AssignOp>(loc, elementValue, tempElement,
                                     /*realloc=*/false,
                                     /*keep_lhs_length_if_realloc=*/false,

diff  --git a/flang/lib/Optimizer/HLFIR/Transforms/InlineElementals.cpp b/flang/lib/Optimizer/HLFIR/Transforms/InlineElementals.cpp
index bbc989f9e046e71..a99038fdfba9853 100644
--- a/flang/lib/Optimizer/HLFIR/Transforms/InlineElementals.cpp
+++ b/flang/lib/Optimizer/HLFIR/Transforms/InlineElementals.cpp
@@ -41,6 +41,11 @@ getTwoUses(hlfir::ElementalOp elemental) {
     return std::nullopt;
   }
 
+  // If the ElementalOp must produce a temporary (e.g. for
+  // finalization purposes), then we cannot inline it.
+  if (hlfir::elementalOpMustProduceTemp(elemental))
+    return std::nullopt;
+
   hlfir::ApplyOp apply;
   hlfir::DestroyOp destroy;
   for (mlir::Operation *user : users)

diff  --git a/flang/lib/Optimizer/HLFIR/Transforms/OptimizedBufferization.cpp b/flang/lib/Optimizer/HLFIR/Transforms/OptimizedBufferization.cpp
index 748b91d9f457e3d..7abfa20493c736b 100644
--- a/flang/lib/Optimizer/HLFIR/Transforms/OptimizedBufferization.cpp
+++ b/flang/lib/Optimizer/HLFIR/Transforms/OptimizedBufferization.cpp
@@ -302,6 +302,13 @@ ElementalAssignBufferization::findMatch(hlfir::ElementalOp elemental) {
     return std::nullopt;
   }
 
+  // If the ElementalOp must produce a temporary (e.g. for
+  // finalization purposes), then we cannot inline it.
+  if (hlfir::elementalOpMustProduceTemp(elemental)) {
+    LLVM_DEBUG(llvm::dbgs() << "ElementalOp must produce a temp\n");
+    return std::nullopt;
+  }
+
   MatchInfo match;
   for (mlir::Operation *user : users)
     mlir::TypeSwitch<mlir::Operation *, void>(user)

diff  --git a/flang/runtime/derived-api.cpp b/flang/runtime/derived-api.cpp
index 32d4bb26608b4dd..66123030f98b57d 100644
--- a/flang/runtime/derived-api.cpp
+++ b/flang/runtime/derived-api.cpp
@@ -41,6 +41,18 @@ void RTNAME(Destroy)(const Descriptor &descriptor) {
   }
 }
 
+void RTNAME(Finalize)(
+    const Descriptor &descriptor, const char *sourceFile, int sourceLine) {
+  if (const DescriptorAddendum * addendum{descriptor.Addendum()}) {
+    if (const auto *derived{addendum->derivedType()}) {
+      if (!derived->noFinalizationNeeded()) {
+        Terminator terminator{sourceFile, sourceLine};
+        Finalize(descriptor, *derived, &terminator);
+      }
+    }
+  }
+}
+
 bool RTNAME(ClassIs)(
     const Descriptor &descriptor, const typeInfo::DerivedType &derivedType) {
   if (const DescriptorAddendum * addendum{descriptor.Addendum()}) {

diff  --git a/flang/test/HLFIR/associate-codegen.fir b/flang/test/HLFIR/associate-codegen.fir
index 3d8840ba893a54a..4cd50af27ebe04f 100644
--- a/flang/test/HLFIR/associate-codegen.fir
+++ b/flang/test/HLFIR/associate-codegen.fir
@@ -428,7 +428,7 @@ func.func @_QPtest_multitple_associates_for_same_expr() {
 // CHECK:           %[[VAL_29:.*]] = fir.convert %[[VAL_24]]#1 : (!fir.heap<!fir.array<10x!fir.char<1>>>) -> !fir.ref<!fir.array<10x!fir.char<1>>>
 // CHECK:           %[[VAL_30:.*]] = fir.convert %[[VAL_29]] : (!fir.ref<!fir.array<10x!fir.char<1>>>) -> !fir.heap<!fir.array<10x!fir.char<1>>>
 // CHECK:           fir.freemem %[[VAL_30]] : !fir.heap<!fir.array<10x!fir.char<1>>>
-// CHECK:           fir.freemem %[[VAL_4]]#1 : !fir.heap<!fir.array<10x!fir.char<1>>>
+// CHECK:           fir.freemem %[[VAL_4]]#0 : !fir.heap<!fir.array<10x!fir.char<1>>>
 // CHECK:           return
 // CHECK:         }
 

diff  --git a/flang/test/HLFIR/bufferize-destroy-for-derived.fir b/flang/test/HLFIR/bufferize-destroy-for-derived.fir
new file mode 100644
index 000000000000000..5c12bc580cfeadb
--- /dev/null
+++ b/flang/test/HLFIR/bufferize-destroy-for-derived.fir
@@ -0,0 +1,97 @@
+// Test buffer destruction for hlfir.destroy operations with
+// operands of derived types.
+// RUN: fir-opt --bufferize-hlfir %s | FileCheck %s
+
+func.func @_QPtest1(%arg0: !fir.box<!fir.array<?x!fir.type<_QMtypesTt1{x:!fir.box<!fir.heap<f32>>}>>> {fir.bindc_name = "x"}) {
+  %c0 = arith.constant 0 : index
+  %0 = fir.alloca !fir.type<_QMtypesTt1{x:!fir.box<!fir.heap<f32>>}> {bindc_name = ".result"}
+  %1:2 = hlfir.declare %arg0 {uniq_name = "_QFtest1Ex"} : (!fir.box<!fir.array<?x!fir.type<_QMtypesTt1{x:!fir.box<!fir.heap<f32>>}>>>) -> (!fir.box<!fir.array<?x!fir.type<_QMtypesTt1{x:!fir.box<!fir.heap<f32>>}>>>, !fir.box<!fir.array<?x!fir.type<_QMtypesTt1{x:!fir.box<!fir.heap<f32>>}>>>)
+  %2:3 = fir.box_dims %1#0, %c0 : (!fir.box<!fir.array<?x!fir.type<_QMtypesTt1{x:!fir.box<!fir.heap<f32>>}>>>, index) -> (index, index, index)
+  %3 = fir.shape %2#1 : (index) -> !fir.shape<1>
+  %4 = hlfir.elemental %3 unordered : (!fir.shape<1>) -> !hlfir.expr<?x!fir.type<_QMtypesTt1{x:!fir.box<!fir.heap<f32>>}>> {
+  ^bb0(%arg1: index):
+    %5 = hlfir.designate %1#0 (%arg1)  : (!fir.box<!fir.array<?x!fir.type<_QMtypesTt1{x:!fir.box<!fir.heap<f32>>}>>>, index) -> !fir.ref<!fir.type<_QMtypesTt1{x:!fir.box<!fir.heap<f32>>}>>
+    %6 = fir.call @_QPelem1(%5) fastmath<contract> : (!fir.ref<!fir.type<_QMtypesTt1{x:!fir.box<!fir.heap<f32>>}>>) -> !fir.type<_QMtypesTt1{x:!fir.box<!fir.heap<f32>>}>
+    fir.save_result %6 to %0 : !fir.type<_QMtypesTt1{x:!fir.box<!fir.heap<f32>>}>, !fir.ref<!fir.type<_QMtypesTt1{x:!fir.box<!fir.heap<f32>>}>>
+    %7:2 = hlfir.declare %0 {uniq_name = ".tmp.func_result"} : (!fir.ref<!fir.type<_QMtypesTt1{x:!fir.box<!fir.heap<f32>>}>>) -> (!fir.ref<!fir.type<_QMtypesTt1{x:!fir.box<!fir.heap<f32>>}>>, !fir.ref<!fir.type<_QMtypesTt1{x:!fir.box<!fir.heap<f32>>}>>)
+    hlfir.yield_element %7#0 : !fir.ref<!fir.type<_QMtypesTt1{x:!fir.box<!fir.heap<f32>>}>>
+  }
+  hlfir.assign %4 to %1#0 : !hlfir.expr<?x!fir.type<_QMtypesTt1{x:!fir.box<!fir.heap<f32>>}>>, !fir.box<!fir.array<?x!fir.type<_QMtypesTt1{x:!fir.box<!fir.heap<f32>>}>>>
+  hlfir.destroy %4 : !hlfir.expr<?x!fir.type<_QMtypesTt1{x:!fir.box<!fir.heap<f32>>}>>
+  return
+}
+// CHECK-LABEL:   func.func @_QPtest1(
+// CHECK:             hlfir.assign %{{.*}} to %{{.*}} temporary_lhs : !fir.ref<!fir.type<_QMtypesTt1{x:!fir.box<!fir.heap<f32>>}>>, !fir.ref<!fir.type<_QMtypesTt1{x:!fir.box<!fir.heap<f32>>}>>
+// CHECK:           hlfir.assign %[[VAL_7:.*]]#0 to %{{.*}}#0 : !fir.box<!fir.array<?x!fir.type<_QMtypesTt1{x:!fir.box<!fir.heap<f32>>}>>>, !fir.box<!fir.array<?x!fir.type<_QMtypesTt1{x:!fir.box<!fir.heap<f32>>}>>>
+// CHECK-NEXT:      %[[VAL_18:.*]] = fir.box_addr %[[VAL_7]]#0 : (!fir.box<!fir.array<?x!fir.type<_QMtypesTt1{x:!fir.box<!fir.heap<f32>>}>>>) -> !fir.heap<!fir.array<?x!fir.type<_QMtypesTt1{x:!fir.box<!fir.heap<f32>>}>>>
+// CHECK-NEXT:      %[[VAL_19:.*]] = fir.convert %[[VAL_7]]#0 : (!fir.box<!fir.array<?x!fir.type<_QMtypesTt1{x:!fir.box<!fir.heap<f32>>}>>>) -> !fir.box<none>
+// CHECK-NEXT:      %[[VAL_20:.*]] = fir.call @_FortranADestroyWithoutFinalization(%[[VAL_19]]) : (!fir.box<none>) -> none
+// CHECK-NEXT:      fir.freemem %[[VAL_18]] : !fir.heap<!fir.array<?x!fir.type<_QMtypesTt1{x:!fir.box<!fir.heap<f32>>}>>>
+// CHECK-NEXT:      return
+// CHECK-NEXT:    }
+
+func.func @_QPtest2(%arg0: !fir.box<!fir.array<?x!fir.type<_QMtypesTt2{x:!fir.box<!fir.heap<f32>>}>>> {fir.bindc_name = "x"}) {
+  %c0 = arith.constant 0 : index
+  %0 = fir.alloca !fir.type<_QMtypesTt2{x:!fir.box<!fir.heap<f32>>}> {bindc_name = ".result"}
+  %1:2 = hlfir.declare %arg0 {uniq_name = "_QFtest2Ex"} : (!fir.box<!fir.array<?x!fir.type<_QMtypesTt2{x:!fir.box<!fir.heap<f32>>}>>>) -> (!fir.box<!fir.array<?x!fir.type<_QMtypesTt2{x:!fir.box<!fir.heap<f32>>}>>>, !fir.box<!fir.array<?x!fir.type<_QMtypesTt2{x:!fir.box<!fir.heap<f32>>}>>>)
+  %2:3 = fir.box_dims %1#0, %c0 : (!fir.box<!fir.array<?x!fir.type<_QMtypesTt2{x:!fir.box<!fir.heap<f32>>}>>>, index) -> (index, index, index)
+  %3 = fir.shape %2#1 : (index) -> !fir.shape<1>
+  %4 = hlfir.elemental %3 unordered : (!fir.shape<1>) -> !hlfir.expr<?x!fir.type<_QMtypesTt2{x:!fir.box<!fir.heap<f32>>}>> {
+  ^bb0(%arg1: index):
+    %5 = hlfir.designate %1#0 (%arg1)  : (!fir.box<!fir.array<?x!fir.type<_QMtypesTt2{x:!fir.box<!fir.heap<f32>>}>>>, index) -> !fir.ref<!fir.type<_QMtypesTt2{x:!fir.box<!fir.heap<f32>>}>>
+    %6 = fir.call @_QPelem2(%5) fastmath<contract> : (!fir.ref<!fir.type<_QMtypesTt2{x:!fir.box<!fir.heap<f32>>}>>) -> !fir.type<_QMtypesTt2{x:!fir.box<!fir.heap<f32>>}>
+    fir.save_result %6 to %0 : !fir.type<_QMtypesTt2{x:!fir.box<!fir.heap<f32>>}>, !fir.ref<!fir.type<_QMtypesTt2{x:!fir.box<!fir.heap<f32>>}>>
+    %7:2 = hlfir.declare %0 {uniq_name = ".tmp.func_result"} : (!fir.ref<!fir.type<_QMtypesTt2{x:!fir.box<!fir.heap<f32>>}>>) -> (!fir.ref<!fir.type<_QMtypesTt2{x:!fir.box<!fir.heap<f32>>}>>, !fir.ref<!fir.type<_QMtypesTt2{x:!fir.box<!fir.heap<f32>>}>>)
+    hlfir.yield_element %7#0 : !fir.ref<!fir.type<_QMtypesTt2{x:!fir.box<!fir.heap<f32>>}>>
+  }
+  hlfir.assign %4 to %1#0 : !hlfir.expr<?x!fir.type<_QMtypesTt2{x:!fir.box<!fir.heap<f32>>}>>, !fir.box<!fir.array<?x!fir.type<_QMtypesTt2{x:!fir.box<!fir.heap<f32>>}>>>
+  hlfir.destroy %4 finalize : !hlfir.expr<?x!fir.type<_QMtypesTt2{x:!fir.box<!fir.heap<f32>>}>>
+  return
+}
+// CHECK-LABEL:   func.func @_QPtest2(
+// CHECK:             hlfir.assign %{{.*}}#0 to %{{.*}} temporary_lhs : !fir.ref<!fir.type<_QMtypesTt2{x:!fir.box<!fir.heap<f32>>}>>, !fir.ref<!fir.type<_QMtypesTt2{x:!fir.box<!fir.heap<f32>>}>>
+// CHECK:           hlfir.assign %[[VAL_7:.*]]#0 to %{{.*}}#0 : !fir.box<!fir.array<?x!fir.type<_QMtypesTt2{x:!fir.box<!fir.heap<f32>>}>>>, !fir.box<!fir.array<?x!fir.type<_QMtypesTt2{x:!fir.box<!fir.heap<f32>>}>>>
+// CHECK-NEXT:      %[[VAL_18:.*]] = fir.box_addr %[[VAL_7]]#0 : (!fir.box<!fir.array<?x!fir.type<_QMtypesTt2{x:!fir.box<!fir.heap<f32>>}>>>) -> !fir.heap<!fir.array<?x!fir.type<_QMtypesTt2{x:!fir.box<!fir.heap<f32>>}>>>
+// CHECK-NEXT:      %[[VAL_19:.*]] = fir.address_of(@_QQcl.{{.*}}) : !fir.ref<!fir.char<1,{{[0-9]*}}>>
+// CHECK-NEXT:      %[[VAL_20:.*]] = arith.constant {{[0-9]*}} : index
+// CHECK-NEXT:      %[[VAL_21:.*]] = arith.constant {{[0-9]*}} : i32
+// CHECK-NEXT:      %[[VAL_22:.*]] = fir.convert %[[VAL_7]]#0 : (!fir.box<!fir.array<?x!fir.type<_QMtypesTt2{x:!fir.box<!fir.heap<f32>>}>>>) -> !fir.box<none>
+// CHECK-NEXT:      %[[VAL_23:.*]] = fir.convert %[[VAL_19]] : (!fir.ref<!fir.char<1,{{[0-9]*}}>>) -> !fir.ref<i8>
+// CHECK-NEXT:      %[[VAL_24:.*]] = fir.call @_FortranAFinalize(%[[VAL_22]], %[[VAL_23]], %[[VAL_21]]) : (!fir.box<none>, !fir.ref<i8>, i32) -> none
+// CHECK-NEXT:      %[[VAL_25:.*]] = fir.convert %[[VAL_7]]#0 : (!fir.box<!fir.array<?x!fir.type<_QMtypesTt2{x:!fir.box<!fir.heap<f32>>}>>>) -> !fir.box<none>
+// CHECK-NEXT:      %[[VAL_26:.*]] = fir.call @_FortranADestroyWithoutFinalization(%[[VAL_25]]) : (!fir.box<none>) -> none
+// CHECK-NEXT:      fir.freemem %[[VAL_18]] : !fir.heap<!fir.array<?x!fir.type<_QMtypesTt2{x:!fir.box<!fir.heap<f32>>}>>>
+// CHECK-NEXT:      return
+// CHECK-NEXT:    }
+
+func.func @_QPtest3(%arg0: !fir.box<!fir.array<?x!fir.type<_QMtypesTt3{x:f32}>>> {fir.bindc_name = "x"}) {
+  %c0 = arith.constant 0 : index
+  %0 = fir.alloca !fir.type<_QMtypesTt3{x:f32}> {bindc_name = ".result"}
+  %1:2 = hlfir.declare %arg0 {uniq_name = "_QFtest3Ex"} : (!fir.box<!fir.array<?x!fir.type<_QMtypesTt3{x:f32}>>>) -> (!fir.box<!fir.array<?x!fir.type<_QMtypesTt3{x:f32}>>>, !fir.box<!fir.array<?x!fir.type<_QMtypesTt3{x:f32}>>>)
+  %2:3 = fir.box_dims %1#0, %c0 : (!fir.box<!fir.array<?x!fir.type<_QMtypesTt3{x:f32}>>>, index) -> (index, index, index)
+  %3 = fir.shape %2#1 : (index) -> !fir.shape<1>
+  %4 = hlfir.elemental %3 unordered : (!fir.shape<1>) -> !hlfir.expr<?x!fir.type<_QMtypesTt3{x:f32}>> {
+  ^bb0(%arg1: index):
+    %5 = hlfir.designate %1#0 (%arg1)  : (!fir.box<!fir.array<?x!fir.type<_QMtypesTt3{x:f32}>>>, index) -> !fir.ref<!fir.type<_QMtypesTt3{x:f32}>>
+    %6 = fir.call @_QPelem3(%5) fastmath<contract> : (!fir.ref<!fir.type<_QMtypesTt3{x:f32}>>) -> !fir.type<_QMtypesTt3{x:f32}>
+    fir.save_result %6 to %0 : !fir.type<_QMtypesTt3{x:f32}>, !fir.ref<!fir.type<_QMtypesTt3{x:f32}>>
+    %7:2 = hlfir.declare %0 {uniq_name = ".tmp.func_result"} : (!fir.ref<!fir.type<_QMtypesTt3{x:f32}>>) -> (!fir.ref<!fir.type<_QMtypesTt3{x:f32}>>, !fir.ref<!fir.type<_QMtypesTt3{x:f32}>>)
+    hlfir.yield_element %7#0 : !fir.ref<!fir.type<_QMtypesTt3{x:f32}>>
+  }
+  hlfir.assign %4 to %1#0 : !hlfir.expr<?x!fir.type<_QMtypesTt3{x:f32}>>, !fir.box<!fir.array<?x!fir.type<_QMtypesTt3{x:f32}>>>
+  hlfir.destroy %4 finalize : !hlfir.expr<?x!fir.type<_QMtypesTt3{x:f32}>>
+  return
+}
+// CHECK-LABEL:   func.func @_QPtest3(
+// CHECK:             hlfir.assign %{{.*}}#0 to %{{.*}} temporary_lhs : !fir.ref<!fir.type<_QMtypesTt3{x:f32}>>, !fir.ref<!fir.type<_QMtypesTt3{x:f32}>>
+// CHECK:           hlfir.assign %[[VAL_7:.*]]#0 to %{{.*}}#0 : !fir.box<!fir.array<?x!fir.type<_QMtypesTt3{x:f32}>>>, !fir.box<!fir.array<?x!fir.type<_QMtypesTt3{x:f32}>>>
+// CHECK-NEXT:      %[[VAL_18:.*]] = fir.box_addr %[[VAL_7]]#0 : (!fir.box<!fir.array<?x!fir.type<_QMtypesTt3{x:f32}>>>) -> !fir.heap<!fir.array<?x!fir.type<_QMtypesTt3{x:f32}>>>
+// CHECK-NEXT:      %[[VAL_19:.*]] = fir.address_of(@_QQcl.{{.*}}) : !fir.ref<!fir.char<1,{{[0-9]*}}>>
+// CHECK-NEXT:      %[[VAL_20:.*]] = arith.constant {{[0-9]*}} : index
+// CHECK-NEXT:      %[[VAL_21:.*]] = arith.constant {{[0-9]*}} : i32
+// CHECK-NEXT:      %[[VAL_22:.*]] = fir.convert %[[VAL_7]]#0 : (!fir.box<!fir.array<?x!fir.type<_QMtypesTt3{x:f32}>>>) -> !fir.box<none>
+// CHECK-NEXT:      %[[VAL_23:.*]] = fir.convert %[[VAL_19]] : (!fir.ref<!fir.char<1,{{[0-9]*}}>>) -> !fir.ref<i8>
+// CHECK-NEXT:      %[[VAL_24:.*]] = fir.call @_FortranAFinalize(%[[VAL_22]], %[[VAL_23]], %[[VAL_21]]) : (!fir.box<none>, !fir.ref<i8>, i32) -> none
+// CHECK-NEXT:      fir.freemem %[[VAL_18]] : !fir.heap<!fir.array<?x!fir.type<_QMtypesTt3{x:f32}>>>
+// CHECK-NEXT:      return
+// CHECK-NEXT:    }

diff  --git a/flang/test/HLFIR/bufferize-end-associate-for-derived.fir b/flang/test/HLFIR/bufferize-end-associate-for-derived.fir
new file mode 100644
index 000000000000000..089fe574893db5c
--- /dev/null
+++ b/flang/test/HLFIR/bufferize-end-associate-for-derived.fir
@@ -0,0 +1,52 @@
+// Test buffer destruction for hlfir.end_associate operations with
+// operands of derived types.
+// RUN: fir-opt --bufferize-hlfir %s | FileCheck %s
+
+func.func @_QPtest1(%arg0: !fir.box<!fir.array<?x!fir.type<_QMtypesTt1{x:!fir.box<!fir.heap<f32>>}>>> {fir.bindc_name = "x"}) {
+  %c0 = arith.constant 0 : index
+  %0:2 = hlfir.declare %arg0 {uniq_name = "_QFtest1Ex"} : (!fir.box<!fir.array<?x!fir.type<_QMtypesTt1{x:!fir.box<!fir.heap<f32>>}>>>) -> (!fir.box<!fir.array<?x!fir.type<_QMtypesTt1{x:!fir.box<!fir.heap<f32>>}>>>, !fir.box<!fir.array<?x!fir.type<_QMtypesTt1{x:!fir.box<!fir.heap<f32>>}>>>)
+  %1 = hlfir.as_expr %0#0 : (!fir.box<!fir.array<?x!fir.type<_QMtypesTt1{x:!fir.box<!fir.heap<f32>>}>>>) -> !hlfir.expr<?x!fir.type<_QMtypesTt1{x:!fir.box<!fir.heap<f32>>}>>
+  %2:3 = fir.box_dims %0#0, %c0 : (!fir.box<!fir.array<?x!fir.type<_QMtypesTt1{x:!fir.box<!fir.heap<f32>>}>>>, index) -> (index, index, index)
+  %3 = fir.shape %2#1 : (index) -> !fir.shape<1>
+  %4:3 = hlfir.associate %1(%3) {uniq_name = "adapt.valuebyref"} : (!hlfir.expr<?x!fir.type<_QMtypesTt1{x:!fir.box<!fir.heap<f32>>}>>, !fir.shape<1>) -> (!fir.box<!fir.array<?x!fir.type<_QMtypesTt1{x:!fir.box<!fir.heap<f32>>}>>>, !fir.ref<!fir.array<?x!fir.type<_QMtypesTt1{x:!fir.box<!fir.heap<f32>>}>>>, i1)
+  %5 = fir.convert %4#1 : (!fir.ref<!fir.array<?x!fir.type<_QMtypesTt1{x:!fir.box<!fir.heap<f32>>}>>>) -> !fir.ref<!fir.array<10x!fir.type<_QMtypesTt1{x:!fir.box<!fir.heap<f32>>}>>>
+  fir.call @_QPcallee1(%5) fastmath<contract> : (!fir.ref<!fir.array<10x!fir.type<_QMtypesTt1{x:!fir.box<!fir.heap<f32>>}>>>) -> ()
+  hlfir.end_associate %4#0, %4#2 : !fir.box<!fir.array<?x!fir.type<_QMtypesTt1{x:!fir.box<!fir.heap<f32>>}>>>, i1
+  return
+}
+// CHECK-LABEL:   func.func @_QPtest1(
+// CHECK-NOT:       fir.call @_Fortran
+// CHECK:           fir.call @_FortranADestroyWithoutFinalization(%{{.*}}) : (!fir.box<none>) -> none
+// CHECK-NOT:       fir.call @_Fortran
+
+func.func @_QPtest2(%arg0: !fir.box<!fir.array<?x!fir.type<_QMtypesTt2{x:!fir.box<!fir.heap<f32>>}>>> {fir.bindc_name = "x"}) {
+  %c0 = arith.constant 0 : index
+  %0:2 = hlfir.declare %arg0 {uniq_name = "_QFtest2Ex"} : (!fir.box<!fir.array<?x!fir.type<_QMtypesTt2{x:!fir.box<!fir.heap<f32>>}>>>) -> (!fir.box<!fir.array<?x!fir.type<_QMtypesTt2{x:!fir.box<!fir.heap<f32>>}>>>, !fir.box<!fir.array<?x!fir.type<_QMtypesTt2{x:!fir.box<!fir.heap<f32>>}>>>)
+  %1 = hlfir.as_expr %0#0 : (!fir.box<!fir.array<?x!fir.type<_QMtypesTt2{x:!fir.box<!fir.heap<f32>>}>>>) -> !hlfir.expr<?x!fir.type<_QMtypesTt2{x:!fir.box<!fir.heap<f32>>}>>
+  %2:3 = fir.box_dims %0#0, %c0 : (!fir.box<!fir.array<?x!fir.type<_QMtypesTt2{x:!fir.box<!fir.heap<f32>>}>>>, index) -> (index, index, index)
+  %3 = fir.shape %2#1 : (index) -> !fir.shape<1>
+  %4:3 = hlfir.associate %1(%3) {uniq_name = "adapt.valuebyref"} : (!hlfir.expr<?x!fir.type<_QMtypesTt2{x:!fir.box<!fir.heap<f32>>}>>, !fir.shape<1>) -> (!fir.box<!fir.array<?x!fir.type<_QMtypesTt2{x:!fir.box<!fir.heap<f32>>}>>>, !fir.ref<!fir.array<?x!fir.type<_QMtypesTt2{x:!fir.box<!fir.heap<f32>>}>>>, i1)
+  %5 = fir.convert %4#1 : (!fir.ref<!fir.array<?x!fir.type<_QMtypesTt2{x:!fir.box<!fir.heap<f32>>}>>>) -> !fir.ref<!fir.array<10x!fir.type<_QMtypesTt2{x:!fir.box<!fir.heap<f32>>}>>>
+  fir.call @_QPcallee2(%5) fastmath<contract> : (!fir.ref<!fir.array<10x!fir.type<_QMtypesTt2{x:!fir.box<!fir.heap<f32>>}>>>) -> ()
+  hlfir.end_associate %4#0, %4#2 : !fir.box<!fir.array<?x!fir.type<_QMtypesTt2{x:!fir.box<!fir.heap<f32>>}>>>, i1
+  return
+}
+// CHECK-LABEL:   func.func @_QPtest2(
+// CHECK-NOT:       fir.call @_Fortran
+// CHECK:           fir.call @_FortranADestroyWithoutFinalization(%{{.*}}) : (!fir.box<none>) -> none
+// CHECK-NOT:       fir.call @_Fortran
+
+func.func @_QPtest3(%arg0: !fir.box<!fir.array<?x!fir.type<_QMtypesTt3{x:f32}>>> {fir.bindc_name = "x"}) {
+  %c0 = arith.constant 0 : index
+  %0:2 = hlfir.declare %arg0 {uniq_name = "_QFtest3Ex"} : (!fir.box<!fir.array<?x!fir.type<_QMtypesTt3{x:f32}>>>) -> (!fir.box<!fir.array<?x!fir.type<_QMtypesTt3{x:f32}>>>, !fir.box<!fir.array<?x!fir.type<_QMtypesTt3{x:f32}>>>)
+  %1 = hlfir.as_expr %0#0 : (!fir.box<!fir.array<?x!fir.type<_QMtypesTt3{x:f32}>>>) -> !hlfir.expr<?x!fir.type<_QMtypesTt3{x:f32}>>
+  %2:3 = fir.box_dims %0#0, %c0 : (!fir.box<!fir.array<?x!fir.type<_QMtypesTt3{x:f32}>>>, index) -> (index, index, index)
+  %3 = fir.shape %2#1 : (index) -> !fir.shape<1>
+  %4:3 = hlfir.associate %1(%3) {uniq_name = "adapt.valuebyref"} : (!hlfir.expr<?x!fir.type<_QMtypesTt3{x:f32}>>, !fir.shape<1>) -> (!fir.box<!fir.array<?x!fir.type<_QMtypesTt3{x:f32}>>>, !fir.ref<!fir.array<?x!fir.type<_QMtypesTt3{x:f32}>>>, i1)
+  %5 = fir.convert %4#1 : (!fir.ref<!fir.array<?x!fir.type<_QMtypesTt3{x:f32}>>>) -> !fir.ref<!fir.array<10x!fir.type<_QMtypesTt3{x:f32}>>>
+  fir.call @_QPcallee3(%5) fastmath<contract> : (!fir.ref<!fir.array<10x!fir.type<_QMtypesTt3{x:f32}>>>) -> ()
+  hlfir.end_associate %4#1, %4#2 : !fir.ref<!fir.array<?x!fir.type<_QMtypesTt3{x:f32}>>>, i1
+  return
+}
+// CHECK-LABEL:   func.func @_QPtest3(
+// CHECK-NOT:       fir.call @_Fortran

diff  --git a/flang/test/HLFIR/bufferize-poly-expr.fir b/flang/test/HLFIR/bufferize-poly-expr.fir
index 31987643d1b7673..df43c74b2aaccba 100644
--- a/flang/test/HLFIR/bufferize-poly-expr.fir
+++ b/flang/test/HLFIR/bufferize-poly-expr.fir
@@ -30,7 +30,7 @@ func.func @test_poly_expr_without_associate() {
 // CHECK:           %[[VAL_14:.*]] = fir.insert_value %[[VAL_13]], %[[VAL_7]], [1 : index] : (tuple<!fir.ref<!fir.class<!fir.heap<!fir.type<_QFtestTt{c:i32}>>>>, i1>, i1) -> tuple<!fir.ref<!fir.class<!fir.heap<!fir.type<_QFtestTt{c:i32}>>>>, i1>
 // CHECK:           %[[VAL_15:.*]] = fir.insert_value %[[VAL_14]], %[[VAL_8]]#0, [0 : index] : (tuple<!fir.ref<!fir.class<!fir.heap<!fir.type<_QFtestTt{c:i32}>>>>, i1>, !fir.ref<!fir.class<!fir.heap<!fir.type<_QFtestTt{c:i32}>>>>) -> tuple<!fir.ref<!fir.class<!fir.heap<!fir.type<_QFtestTt{c:i32}>>>>, i1>
 // CHECK:           hlfir.assign %[[VAL_8]]#0 to %[[VAL_2]]#0 realloc : !fir.ref<!fir.class<!fir.heap<!fir.type<_QFtestTt{c:i32}>>>>, !fir.ref<!fir.class<!fir.heap<!fir.type<_QFtestTt{c:i32}>>>>
-// CHECK:           %[[VAL_16:.*]] = fir.load %[[VAL_8]]#1 : !fir.ref<!fir.class<!fir.heap<!fir.type<_QFtestTt{c:i32}>>>>
+// CHECK:           %[[VAL_16:.*]] = fir.load %[[VAL_8]]#0 : !fir.ref<!fir.class<!fir.heap<!fir.type<_QFtestTt{c:i32}>>>>
 // CHECK:           %[[VAL_17:.*]] = fir.box_addr %[[VAL_16]] : (!fir.class<!fir.heap<!fir.type<_QFtestTt{c:i32}>>>) -> !fir.heap<!fir.type<_QFtestTt{c:i32}>>
 // CHECK:           fir.freemem %[[VAL_17]] : !fir.heap<!fir.type<_QFtestTt{c:i32}>>
 // CHECK:           return

diff  --git a/flang/test/HLFIR/destroy.fir b/flang/test/HLFIR/destroy.fir
index 2e6f62f721777b4..5a2f7bfa55e1b13 100644
--- a/flang/test/HLFIR/destroy.fir
+++ b/flang/test/HLFIR/destroy.fir
@@ -9,3 +9,19 @@ func.func @test(%expr : !hlfir.expr<?x?xf64>) {
 // CHECK-LABEL: func.func @test(
 // CHECK-SAME:    %[[VAL_0:.*]]: !hlfir.expr<?x?xf64>) {
 // CHECK:   hlfir.destroy %[[VAL_0]] : !hlfir.expr<?x?xf64>
+
+func.func @test_finalize_dt(%expr : !hlfir.expr<?x!fir.type<_QMtypesTt>>) {
+  hlfir.destroy %expr finalize : !hlfir.expr<?x!fir.type<_QMtypesTt>>
+  return
+}
+// CHECK-LABEL:   func.func @test_finalize_dt(
+// CHECK-SAME:        %[[VAL_0:.*]]: !hlfir.expr<?x!fir.type<_QMtypesTt>>) {
+// CHECK:           hlfir.destroy %[[VAL_0]] finalize : !hlfir.expr<?x!fir.type<_QMtypesTt>>
+
+func.func @test_finalize_poly(%expr : !hlfir.expr<?x!fir.type<_QMtypesTt>?>) {
+  hlfir.destroy %expr finalize : !hlfir.expr<?x!fir.type<_QMtypesTt>?>
+  return
+}
+// CHECK-LABEL:   func.func @test_finalize_poly(
+// CHECK-SAME:        %[[VAL_0:.*]]: !hlfir.expr<?x!fir.type<_QMtypesTt>?>) {
+// CHECK:           hlfir.destroy %[[VAL_0]] finalize : !hlfir.expr<?x!fir.type<_QMtypesTt>?>

diff  --git a/flang/test/HLFIR/elemental-codegen-nested.fir b/flang/test/HLFIR/elemental-codegen-nested.fir
index 1d9586204b6a2aa..3ef296249c7d64c 100644
--- a/flang/test/HLFIR/elemental-codegen-nested.fir
+++ b/flang/test/HLFIR/elemental-codegen-nested.fir
@@ -47,7 +47,7 @@
 // CHECK:           %[[VAL_37:.*]] = fir.insert_value %[[VAL_36]], %[[VAL_15]], [1 : index] : (tuple<!fir.heap<!fir.array<2xf32>>, i1>, i1) -> tuple<!fir.heap<!fir.array<2xf32>>, i1>
 // CHECK:           %[[VAL_38:.*]] = fir.insert_value %[[VAL_37]], %[[VAL_14]]#0, [0 : index] : (tuple<!fir.heap<!fir.array<2xf32>>, i1>, !fir.heap<!fir.array<2xf32>>) -> tuple<!fir.heap<!fir.array<2xf32>>, i1>
 // CHECK:           hlfir.assign %[[VAL_14]]#0 to %[[VAL_4]]#0 : !fir.heap<!fir.array<2xf32>>, !fir.ref<!fir.array<2xf32>>
-// CHECK:           fir.freemem %[[VAL_14]]#1 : !fir.heap<!fir.array<2xf32>>
+// CHECK:           fir.freemem %[[VAL_14]]#0 : !fir.heap<!fir.array<2xf32>>
 // CHECK:           return
 // CHECK:         }
 func.func @_QPtest(%arg0: !fir.ref<f32> {fir.bindc_name = "pi"}, %arg1: !fir.ref<!fir.array<2xf32>> {fir.bindc_name = "h1"}) {

diff  --git a/flang/test/HLFIR/inline-elemental.fir b/flang/test/HLFIR/inline-elemental.fir
index c64c6b6a204b755..2d3beace4c75915 100644
--- a/flang/test/HLFIR/inline-elemental.fir
+++ b/flang/test/HLFIR/inline-elemental.fir
@@ -310,3 +310,40 @@ func.func @noinline_ordered(%arg0: !fir.box<!fir.array<?xi32>> {fir.bindc_name =
 // CHECK:           hlfir.destroy %[[VAL_26:.*]] : !hlfir.expr<?xi32>
 // CHECK:           return
 // CHECK:         }
+
+// Check that the elemental is not inlined, because its array result
+// must be finalized.
+// FIXME: the inlining is actually blocked by the type check
+// between the yield_element and apply. When this is fixed,
+// the test should keep passing.
+func.func @noinline_due_to_finalization(%arg0: !fir.box<!fir.array<?x!fir.type<_QMtypesTt1{x:f32}>>> {fir.bindc_name = "x"}) {
+  %c0 = arith.constant 0 : index
+  %0 = fir.alloca !fir.type<_QMtypesTt1{x:f32}> {bindc_name = ".result"}
+  %1:2 = hlfir.declare %arg0 {uniq_name = "_QFtest1Ex"} : (!fir.box<!fir.array<?x!fir.type<_QMtypesTt1{x:f32}>>>) -> (!fir.box<!fir.array<?x!fir.type<_QMtypesTt1{x:f32}>>>, !fir.box<!fir.array<?x!fir.type<_QMtypesTt1{x:f32}>>>)
+  %2:3 = fir.box_dims %1#0, %c0 : (!fir.box<!fir.array<?x!fir.type<_QMtypesTt1{x:f32}>>>, index) -> (index, index, index)
+  %3 = fir.shape %2#1 : (index) -> !fir.shape<1>
+  %4 = hlfir.elemental %3 unordered : (!fir.shape<1>) -> !hlfir.expr<?x!fir.type<_QMtypesTt1{x:f32}>> {
+  ^bb0(%arg1: index):
+    %6 = hlfir.designate %1#0 (%arg1)  : (!fir.box<!fir.array<?x!fir.type<_QMtypesTt1{x:f32}>>>, index) -> !fir.ref<!fir.type<_QMtypesTt1{x:f32}>>
+    %7 = fir.call @_QPelem1(%6) fastmath<contract> : (!fir.ref<!fir.type<_QMtypesTt1{x:f32}>>) -> !fir.type<_QMtypesTt1{x:f32}>
+    fir.save_result %7 to %0 : !fir.type<_QMtypesTt1{x:f32}>, !fir.ref<!fir.type<_QMtypesTt1{x:f32}>>
+    %8:2 = hlfir.declare %0 {uniq_name = ".tmp.func_result"} : (!fir.ref<!fir.type<_QMtypesTt1{x:f32}>>) -> (!fir.ref<!fir.type<_QMtypesTt1{x:f32}>>, !fir.ref<!fir.type<_QMtypesTt1{x:f32}>>)
+    hlfir.yield_element %8#0 : !fir.ref<!fir.type<_QMtypesTt1{x:f32}>>
+  }
+  %5 = hlfir.elemental %3 unordered : (!fir.shape<1>) -> !hlfir.expr<?x!fir.type<_QMtypesTt1{x:f32}>> {
+  ^bb0(%arg1: index):
+    %6 = hlfir.apply %4, %arg1 : (!hlfir.expr<?x!fir.type<_QMtypesTt1{x:f32}>>, index) -> !hlfir.expr<!fir.type<_QMtypesTt1{x:f32}>>
+    %7 = hlfir.no_reassoc %6 : !hlfir.expr<!fir.type<_QMtypesTt1{x:f32}>>
+    hlfir.yield_element %7 : !hlfir.expr<!fir.type<_QMtypesTt1{x:f32}>>
+  }
+  hlfir.assign %5 to %1#0 : !hlfir.expr<?x!fir.type<_QMtypesTt1{x:f32}>>, !fir.box<!fir.array<?x!fir.type<_QMtypesTt1{x:f32}>>>
+  hlfir.destroy %5 : !hlfir.expr<?x!fir.type<_QMtypesTt1{x:f32}>>
+  hlfir.destroy %4 finalize : !hlfir.expr<?x!fir.type<_QMtypesTt1{x:f32}>>
+  return
+}
+// CHECK-LABEL:   func.func @noinline_due_to_finalization(
+// CHECK:           %[[VAL_6:.*]] = hlfir.elemental %{{.*}} unordered : (!fir.shape<1>) -> !hlfir.expr<?x!fir.type<_QMtypesTt1{x:f32}>> {
+// CHECK:           %[[VAL_11:.*]] = hlfir.elemental %{{.*}} unordered : (!fir.shape<1>) -> !hlfir.expr<?x!fir.type<_QMtypesTt1{x:f32}>> {
+// CHECK:             %[[VAL_13:.*]] = hlfir.apply %[[VAL_6]], %{{.*}} : (!hlfir.expr<?x!fir.type<_QMtypesTt1{x:f32}>>, index) -> !hlfir.expr<!fir.type<_QMtypesTt1{x:f32}>>
+// CHECK:           hlfir.destroy %[[VAL_11]] : !hlfir.expr<?x!fir.type<_QMtypesTt1{x:f32}>>
+// CHECK:           hlfir.destroy %[[VAL_6]] finalize : !hlfir.expr<?x!fir.type<_QMtypesTt1{x:f32}>>

diff  --git a/flang/test/HLFIR/invalid.fir b/flang/test/HLFIR/invalid.fir
index 112ab4be57f6bfc..49b6c1852b5982c 100644
--- a/flang/test/HLFIR/invalid.fir
+++ b/flang/test/HLFIR/invalid.fir
@@ -1165,3 +1165,19 @@ func.func @elemental_poly_4(%shape : index) {
   }
   return
 }
+
+// -----
+func.func @destroy_with_finalize(%expr: !hlfir.expr<?xi32>) {
+// expected-error at +1 {{'hlfir.destroy' op the element type must be finalizable, when 'finalize' is set}}
+  hlfir.destroy %expr finalize : !hlfir.expr<?xi32>
+  return
+}
+
+// -----
+
+func.func @end_associate_with_alloc_comp(%var: !hlfir.expr<?x!fir.type<_QMtypesTt{x:!fir.box<!fir.heap<f32>>}>>, %shape: !fir.shape<1>) {
+  %4:3 = hlfir.associate %var(%shape) {uniq_name = "adapt.valuebyref"} : (!hlfir.expr<?x!fir.type<_QMtypesTt{x:!fir.box<!fir.heap<f32>>}>>, !fir.shape<1>) -> (!fir.box<!fir.array<?x!fir.type<_QMtypesTt{x:!fir.box<!fir.heap<f32>>}>>>, !fir.ref<!fir.array<?x!fir.type<_QMtypesTt{x:!fir.box<!fir.heap<f32>>}>>>, i1)
+// expected-error at +1 {{'hlfir.end_associate' op that requires components deallocation must have var operand that is a Fortran entity}}
+  hlfir.end_associate %4#1, %4#2 : !fir.ref<!fir.array<?x!fir.type<_QMtypesTt{x:!fir.box<!fir.heap<f32>>}>>>, i1
+  return
+}

diff  --git a/flang/test/HLFIR/mul_transpose.f90 b/flang/test/HLFIR/mul_transpose.f90
index 97fe3d87a2d0b5b..6f7e59041efbc19 100644
--- a/flang/test/HLFIR/mul_transpose.f90
+++ b/flang/test/HLFIR/mul_transpose.f90
@@ -97,7 +97,8 @@ subroutine mul_transpose(a, b, res)
 ! CHECK-BUFFERING:      %[[TRANSPOSE_RES_HEAP:.*]] = fir.convert %[[TRANSPOSE_RES_REF]] : (!fir.ref<!fir.array<?x?xf32>>) -> !fir.heap<!fir.array<?x?xf32>>
 ! CHECK-BUFFERING-NEXT: fir.freemem %[[TRANSPOSE_RES_HEAP]]
 ! CHECK-BUFFERING-NEXT: hlfir.assign %[[MUL_RES_VAR]]#0 to %[[RES_DECL]]#0 : !fir.box<!fir.array<?x?xf32>>, !fir.ref<!fir.array<1x2xf32>>
-! CHECK-BUFFERING-NEXT: fir.freemem %[[MUL_RES_VAR]]#1
+! CHECK-BUFFERING-NEXT: %[[MUL_RES_HEAP:.*]] = fir.box_addr %[[MUL_RES_VAR]]#0 : (!fir.box<!fir.array<?x?xf32>>) -> !fir.heap<!fir.array<?x?xf32>>
+! CHECK-BUFFERING-NEXT: fir.freemem %[[MUL_RES_HEAP]]
 
 ! CHECK-ALL-NEXT:       return
 ! CHECK-ALL-NEXT:     }

diff  --git a/flang/test/Lower/HLFIR/associate-for-args-with-alloc-components.f90 b/flang/test/Lower/HLFIR/associate-for-args-with-alloc-components.f90
new file mode 100644
index 000000000000000..abb3e627999dfab
--- /dev/null
+++ b/flang/test/Lower/HLFIR/associate-for-args-with-alloc-components.f90
@@ -0,0 +1,65 @@
+! Test that the hlfir.end_associate generated for the argument
+! passing has operand that is a Fortran entity, so that
+! the shape/type-params information is available
+! during bufferization that will have to generate a runtime call
+! for deallocating the allocatable component of the temporary.
+!
+! RUN: bbc -emit-hlfir -o - -I nowhere %s | FileCheck %s
+
+module types
+  type t
+     real, allocatable :: x
+  end type t
+contains
+end module types
+
+subroutine test1(x)
+  use types
+  interface
+     subroutine callee1(x)
+       use types
+       type(t), value :: x(10)
+     end subroutine callee1
+  end interface
+  type(t) :: x(:)
+  call callee1(x)
+end subroutine test1
+! CHECK-LABEL:   func.func @_QPtest1(
+! CHECK:           %[[VAL_6:.*]]:3 = hlfir.associate %{{.*}}(%{{.*}}) {uniq_name = "adapt.valuebyref"} : (!hlfir.expr<?x!fir.type<_QMtypesTt{x:!fir.box<!fir.heap<f32>>}>>, !fir.shape<1>) -> (!fir.box<!fir.array<?x!fir.type<_QMtypesTt{x:!fir.box<!fir.heap<f32>>}>>>, !fir.ref<!fir.array<?x!fir.type<_QMtypesTt{x:!fir.box<!fir.heap<f32>>}>>>, i1)
+! CHECK:           hlfir.end_associate %[[VAL_6]]#0, %[[VAL_6]]#2 : !fir.box<!fir.array<?x!fir.type<_QMtypesTt{x:!fir.box<!fir.heap<f32>>}>>>, i1
+
+subroutine test2(x)
+  use types
+  interface
+     subroutine callee2(x)
+       use types
+       type(t) :: x(:)
+     end subroutine callee2
+  end interface
+  type(t) :: x(:)
+  call callee2((x))
+end subroutine test2
+! CHECK-LABEL:   func.func @_QPtest2(
+! CHECK:           %[[VAL_9:.*]]:3 = hlfir.associate %{{.*}}(%{{.*}}) {uniq_name = "adapt.valuebyref"} : (!hlfir.expr<?x!fir.type<_QMtypesTt{x:!fir.box<!fir.heap<f32>>}>>, !fir.shape<1>) -> (!fir.box<!fir.array<?x!fir.type<_QMtypesTt{x:!fir.box<!fir.heap<f32>>}>>>, !fir.ref<!fir.array<?x!fir.type<_QMtypesTt{x:!fir.box<!fir.heap<f32>>}>>>, i1)
+! CHECK:           hlfir.end_associate %[[VAL_9]]#0, %[[VAL_9]]#2 : !fir.box<!fir.array<?x!fir.type<_QMtypesTt{x:!fir.box<!fir.heap<f32>>}>>>, i1
+
+subroutine test3(x)
+  use types
+  interface
+     subroutine callee3(x)
+       use types
+       type(t), optional, value :: x(10)
+     end subroutine callee3
+  end interface
+  type(t), optional :: x(:)
+  call callee3(x)
+end subroutine test3
+! CHECK-LABEL:   func.func @_QPtest3(
+! CHECK:           %[[VAL_3:.*]]:3 = fir.if %{{.*}} -> (!fir.ref<!fir.array<10x!fir.type<_QMtypesTt{x:!fir.box<!fir.heap<f32>>}>>>, !fir.box<!fir.array<?x!fir.type<_QMtypesTt{x:!fir.box<!fir.heap<f32>>}>>>, i1) {
+! CHECK:             %[[VAL_8:.*]]:3 = hlfir.associate %{{.*}}(%{{.*}}) {uniq_name = "adapt.valuebyref"} : (!hlfir.expr<?x!fir.type<_QMtypesTt{x:!fir.box<!fir.heap<f32>>}>>, !fir.shape<1>) -> (!fir.box<!fir.array<?x!fir.type<_QMtypesTt{x:!fir.box<!fir.heap<f32>>}>>>, !fir.ref<!fir.array<?x!fir.type<_QMtypesTt{x:!fir.box<!fir.heap<f32>>}>>>, i1)
+! CHECK:             %[[VAL_9:.*]] = fir.convert %[[VAL_8]]#1 : (!fir.ref<!fir.array<?x!fir.type<_QMtypesTt{x:!fir.box<!fir.heap<f32>>}>>>) -> !fir.ref<!fir.array<10x!fir.type<_QMtypesTt{x:!fir.box<!fir.heap<f32>>}>>>
+! CHECK:             fir.result %[[VAL_9]], %[[VAL_8]]#0, %[[VAL_8]]#2 : !fir.ref<!fir.array<10x!fir.type<_QMtypesTt{x:!fir.box<!fir.heap<f32>>}>>>, !fir.box<!fir.array<?x!fir.type<_QMtypesTt{x:!fir.box<!fir.heap<f32>>}>>>, i1
+! CHECK:           } else {
+! CHECK:             fir.result %{{.*}}, %{{.*}}, %{{.*}} : !fir.ref<!fir.array<10x!fir.type<_QMtypesTt{x:!fir.box<!fir.heap<f32>>}>>>, !fir.box<!fir.array<?x!fir.type<_QMtypesTt{x:!fir.box<!fir.heap<f32>>}>>>, i1
+! CHECK:           }
+! CHECK:           hlfir.end_associate %[[VAL_3]]#1, %[[VAL_3]]#2 : !fir.box<!fir.array<?x!fir.type<_QMtypesTt{x:!fir.box<!fir.heap<f32>>}>>>, i1

diff  --git a/flang/test/Lower/HLFIR/elemental-call-with-finalization.f90 b/flang/test/Lower/HLFIR/elemental-call-with-finalization.f90
new file mode 100644
index 000000000000000..181c6b74e91096e
--- /dev/null
+++ b/flang/test/Lower/HLFIR/elemental-call-with-finalization.f90
@@ -0,0 +1,60 @@
+! Test HLFIR lowering of user defined elemental procedure references
+! with finalizable results. Verify that the elemental results
+! are not destroyed inside hlfir.elemental.
+! RUN: bbc -emit-hlfir -o - -I nowhere %s 2>&1 | FileCheck %s
+
+module types
+  type t
+   contains
+     final :: finalize
+  end type t
+contains
+  pure subroutine finalize(x)
+    type(t), intent(inout) :: x
+  end subroutine finalize
+end module types
+
+subroutine test1(x)
+  use types
+  interface
+     elemental function elem(x)
+       use types
+       type(t), intent(in) :: x
+       type(t) :: elem
+     end function elem
+  end interface
+  type(t) :: x(:)
+  x = elem(x)
+end subroutine test1
+! CHECK-LABEL:   func.func @_QPtest1(
+! CHECK:           %[[VAL_6:.*]] = hlfir.elemental %{{.*}} : (!fir.shape<1>) -> !hlfir.expr<?x!fir.type<_QMtypesTt>> {
+! CHECK-NOT: fir.call @_FortranADestroy
+! CHECK:           hlfir.destroy %[[VAL_6]] finalize : !hlfir.expr<?x!fir.type<_QMtypesTt>>
+
+subroutine test2(x)
+  use types
+  interface
+     elemental function elem(x)
+       use types
+       type(t), intent(in) :: x
+       type(t) :: elem
+     end function elem
+     elemental function elem2(x, y)
+       use types
+       type(t), intent(in) :: x, y
+       type(t) :: elem2
+     end function elem2
+  end interface
+  type(t) :: x(:)
+  x = elem2(elem(x), elem(x))
+end subroutine test2
+! CHECK-LABEL:   func.func @_QPtest2(
+! CHECK:           %[[VAL_8:.*]] = hlfir.elemental %{{.*}} : (!fir.shape<1>) -> !hlfir.expr<?x!fir.type<_QMtypesTt>> {
+! CHECK-NOT: fir.call @_FortranADestroy
+! CHECK:           %[[VAL_16:.*]] = hlfir.elemental %{{.*}} : (!fir.shape<1>) -> !hlfir.expr<?x!fir.type<_QMtypesTt>> {
+! CHECK-NOT: fir.call @_FortranADestroy
+! CHECK:           %[[VAL_23:.*]] = hlfir.elemental %{{.*}} : (!fir.shape<1>) -> !hlfir.expr<?x!fir.type<_QMtypesTt>> {
+! CHECK-NOT: fir.call @_FortranADestroy
+! CHECK:           hlfir.destroy %[[VAL_23]] finalize : !hlfir.expr<?x!fir.type<_QMtypesTt>>
+! CHECK:           hlfir.destroy %[[VAL_16]] finalize : !hlfir.expr<?x!fir.type<_QMtypesTt>>
+! CHECK:           hlfir.destroy %[[VAL_8]] finalize : !hlfir.expr<?x!fir.type<_QMtypesTt>>

diff  --git a/flang/test/Lower/HLFIR/poly_expr_for_nonpoly_dummy.f90 b/flang/test/Lower/HLFIR/poly_expr_for_nonpoly_dummy.f90
index 821f722cf3d3a35..0d21634dd84daad 100644
--- a/flang/test/Lower/HLFIR/poly_expr_for_nonpoly_dummy.f90
+++ b/flang/test/Lower/HLFIR/poly_expr_for_nonpoly_dummy.f90
@@ -28,7 +28,7 @@ end subroutine test1
 ! CHECK:           %[[VAL_29:.*]]:2 = hlfir.copy_in %[[VAL_28]] : (!fir.box<!fir.array<?x!fir.type<_QMtypesTt>>>) -> (!fir.box<!fir.array<?x!fir.type<_QMtypesTt>>>, i1)
 ! CHECK:           fir.call @_QMtypesPcallee(%[[VAL_29]]#0) fastmath<contract> : (!fir.box<!fir.array<?x!fir.type<_QMtypesTt>>>) -> ()
 ! CHECK:           hlfir.copy_out %[[VAL_29]]#0, %[[VAL_29]]#1 : (!fir.box<!fir.array<?x!fir.type<_QMtypesTt>>>, i1) -> ()
-! CHECK:           hlfir.end_associate %[[VAL_27]]#1, %[[VAL_27]]#2 : !fir.class<!fir.heap<!fir.array<?x!fir.type<_QMtypesTt>>>>, i1
+! CHECK:           hlfir.end_associate %[[VAL_27]]#0, %[[VAL_27]]#2 : !fir.class<!fir.heap<!fir.array<?x!fir.type<_QMtypesTt>>>>, i1
 ! CHECK:           hlfir.destroy %[[VAL_23]] : !hlfir.expr<?x!fir.type<_QMtypesTt>?>
 
 subroutine test2(x)
@@ -43,5 +43,5 @@ end subroutine test2
 ! CHECK:           %[[VAL_11:.*]]:2 = hlfir.copy_in %[[VAL_10]] : (!fir.box<!fir.array<?x!fir.type<_QMtypesTt>>>) -> (!fir.box<!fir.array<?x!fir.type<_QMtypesTt>>>, i1)
 ! CHECK:           fir.call @_QMtypesPcallee(%[[VAL_11]]#0) fastmath<contract> : (!fir.box<!fir.array<?x!fir.type<_QMtypesTt>>>) -> ()
 ! CHECK:           hlfir.copy_out %[[VAL_11]]#0, %[[VAL_11]]#1 : (!fir.box<!fir.array<?x!fir.type<_QMtypesTt>>>, i1) -> ()
-! CHECK:           hlfir.end_associate %[[VAL_9]]#1, %[[VAL_9]]#2 : !fir.class<!fir.heap<!fir.array<?x!fir.type<_QMtypesTt>>>>, i1
+! CHECK:           hlfir.end_associate %[[VAL_9]]#0, %[[VAL_9]]#2 : !fir.class<!fir.heap<!fir.array<?x!fir.type<_QMtypesTt>>>>, i1
 ! CHECK:           hlfir.destroy %[[VAL_5]] : !hlfir.expr<?x!fir.type<_QMtypesTt>?>

diff  --git a/flang/test/Lower/HLFIR/polymorphic-expressions.f90 b/flang/test/Lower/HLFIR/polymorphic-expressions.f90
index e83378255c14f50..f4691ed557fee7e 100644
--- a/flang/test/Lower/HLFIR/polymorphic-expressions.f90
+++ b/flang/test/Lower/HLFIR/polymorphic-expressions.f90
@@ -29,5 +29,5 @@ end subroutine test1
 ! CHECK:           %[[VAL_27:.*]]:3 = hlfir.associate %[[VAL_23]](%[[VAL_26]]) {uniq_name = "adapt.valuebyref"} : (!hlfir.expr<?x!fir.type<_QMpolymorphic_expressions_typesTt{c:i32}>?>, !fir.shape<1>) -> (!fir.class<!fir.heap<!fir.array<?x!fir.type<_QMpolymorphic_expressions_typesTt{c:i32}>>>>, !fir.class<!fir.heap<!fir.array<?x!fir.type<_QMpolymorphic_expressions_typesTt{c:i32}>>>>, i1)
 ! CHECK:           %[[VAL_28:.*]] = fir.rebox %[[VAL_27]]#0 : (!fir.class<!fir.heap<!fir.array<?x!fir.type<_QMpolymorphic_expressions_typesTt{c:i32}>>>>) -> !fir.class<!fir.array<?x!fir.type<_QMpolymorphic_expressions_typesTt{c:i32}>>>
 ! CHECK:           fir.call @_QPcallee(%[[VAL_28]]) fastmath<contract> : (!fir.class<!fir.array<?x!fir.type<_QMpolymorphic_expressions_typesTt{c:i32}>>>) -> ()
-! CHECK:           hlfir.end_associate %[[VAL_27]]#1, %[[VAL_27]]#2 : !fir.class<!fir.heap<!fir.array<?x!fir.type<_QMpolymorphic_expressions_typesTt{c:i32}>>>>, i1
+! CHECK:           hlfir.end_associate %[[VAL_27]]#0, %[[VAL_27]]#2 : !fir.class<!fir.heap<!fir.array<?x!fir.type<_QMpolymorphic_expressions_typesTt{c:i32}>>>>, i1
 ! CHECK:           hlfir.destroy %[[VAL_23]] : !hlfir.expr<?x!fir.type<_QMpolymorphic_expressions_typesTt{c:i32}>?>


        


More information about the flang-commits mailing list