[flang-commits] [flang] 4868d66 - [flang] improve DITypeAttr caching with recursive derived types (#146543)

Thu Jul 3 05:09:05 PDT 2025

Author: jeanPerier
Date: 2025-07-03T14:09:01+02:00
New Revision: 4868d66282b231f22b464471e9a16a1ec2da015e

URL: https://github.com/llvm/llvm-project/commit/4868d66282b231f22b464471e9a16a1ec2da015e
DIFF: https://github.com/llvm/llvm-project/commit/4868d66282b231f22b464471e9a16a1ec2da015e.diff

LOG: [flang] improve DITypeAttr caching with recursive derived types (#146543)

The current DITypeAttr caching for derived type debug metadata
generation strategy is not optimal. This turns out to be an issue for
compile times in apps with very very complex derived types like CP2K

See the added debug-cyclic-derived-type-caching-simple.f90 test for more
details about the duplication issue.

As a real world example justifying the new non trivial caching strategy,
in CP2K, emitting debug type info for the swarm_worker_type` in swarm_worker.F
caused 1,747,347 llvm debug metadata nodes to be emitted instead of 8023
after this patch (200x less) leading to noticeable compile time
improvements (I measured 0.12s spent in `AddDebugInfo` pass instead of
7.5s prior to this patch).

The main idea is that caching is now associating to the cached
DITypeAttr tree for a derived type a list of parent nodes being referred
to recursively via indices in this DITypeAttr.

When leaving the context of a parent node, all types that were cached
and linked to this parent node are cleared from the cache.
This allows more reusage in sub-trees while still fulfilling the MLIR
requirements that DITypeAttr types referring to a parent DITypeAttr via
integer id should only be used inside the DITypeAttr of the parent.

Most of the complexity comes from computing the "list of parent nodes"
by merging the ones from the components.

This is made is such a way that the extra cost for apps without
recursive derived type is minimal because the extra data structure
should not require extra dynamic allocations when they are no or little
recursion.

Example:

Take the following type graph (Fortran source for it in the added
debug-cyclic-derived-type-caching-complex.f90).
A is the tope level types, and has direct components of types B, C, and
E.
There are cycles in the type tree introduced by type B and D.
Types `C` and `E` are of interest here because they are in the middle of
those cycles and appear in several places in the type tree. There
occurrences is labeled in brackets in the order of visit by the
DebugTypeGenerator.

```
 A -> B -> C [1] -> D -> E [1] -> F -> G -> B
 |   |              |             |
 |   |              |             | -> D
 |   |              |
 |   |              | -> H -> E [2] ->  F -> G -> B
 |   |                                  |
 |   |                                  |-> D
 |   |
 |   | -> I -> E [3] ->  F -> G -> B
 |   |                   |
 |   |                   |-> D
 |   | -> C [2]
 |
 | -> C [3] -> D
 | -> E [4] -> F -> G -> B
               |
               | -> D
```

With this patch, E[2] and E[3] can share the same DITypeAttr as well as
C[1] and C[2] while they previously all got there own nodes.

To be safe with regards to cycles in MLIR, a DITypeAttr created for a
node N2 under a node N1 being recursively referred to and above the
recursive reference to N1 shall not be used above N1 in the DITypeAttr
tree. It can however be used in several places under N1.

Hence here:
-E[2] cannot reuse E[1] DITypeAttr because D appears above and under
E[1].
-E[3] can reuse E[2] DITypeAttr because they are both under B and above
D.
-E[4] cannot reuse E[3] DITypeAttr  because it is above B.

This is achieved by this patch because when visiting A and reaching B,
the recursive reference to B is registered in the visit context. This
context is added D when going back-up in F. So when reaching back E[1]
with the information to build its DITypeAttr, its recursive references
are known and saved along the DITypeAttr in the cache.

When reaching back D, the cache for E is cleared because it is known it
depended on D. A new DITypeAttr is created after E[2], and this time it
only depends on B because the D under E[2] is not a recursive reference
(D is not above E[2]). Hence, when reaching E[3] it can be reused, and
the cache entry for E[2] is cleared when reaching B, which leads to a
new DITypeAttr to be created for E[4].

Added: 
    flang/test/Integration/debug-cyclic-derived-type-caching-complex.f90
    flang/test/Integration/debug-cyclic-derived-type-caching-simple.f90

Modified: 
    flang/lib/Optimizer/Transforms/DebugTypeGenerator.cpp
    flang/lib/Optimizer/Transforms/DebugTypeGenerator.h

Removed: 
    


################################################################################
diff  --git a/flang/lib/Optimizer/Transforms/DebugTypeGenerator.cpp b/flang/lib/Optimizer/Transforms/DebugTypeGenerator.cpp
index cdd30dce183dd..a848058486e2c 100644

--- a/flang/lib/Optimizer/Transforms/DebugTypeGenerator.cpp
+++ b/flang/lib/Optimizer/Transforms/DebugTypeGenerator.cpp
@@ -48,8 +48,7 @@ DebugTypeGenerator::DebugTypeGenerator(mlir::ModuleOp m,
                                        mlir::SymbolTable *symbolTable_,
                                        const mlir::DataLayout &dl)
     : module(m), symbolTable(symbolTable_), dataLayout{&dl},
-      kindMapping(getKindMapping(m)), llvmTypeConverter(m, false, false, dl),
-      derivedTypeDepth(0) {
+      kindMapping(getKindMapping(m)), llvmTypeConverter(m, false, false, dl) {
   LLVM_DEBUG(llvm::dbgs() << "DITypeAttr generator\n");
 
   mlir::MLIRContext *context = module.getContext();
@@ -272,31 +271,127 @@ DebugTypeGenerator::getFieldSizeAndAlign(mlir::Type fieldTy) {
   return std::pair{byteSize, byteAlign};
 }
 
+mlir::LLVM::DITypeAttr DerivedTypeCache::lookup(mlir::Type type) {
+  auto iter = typeCache.find(type);
+  if (iter != typeCache.end()) {
+    if (iter->second.first) {
+      componentActiveRecursionLevels = iter->second.second;
+    }
+    return iter->second.first;
+  }
+  return nullptr;
+}
+
+DerivedTypeCache::ActiveLevels
+DerivedTypeCache::startTranslating(mlir::Type type,
+                                   mlir::LLVM::DITypeAttr placeHolder) {
+  derivedTypeDepth++;
+  if (!placeHolder)
+    return {};
+  typeCache[type] = std::pair<mlir::LLVM::DITypeAttr, ActiveLevels>(
+      placeHolder, {derivedTypeDepth});
+  return {};
+}
+
+void DerivedTypeCache::preComponentVisitUpdate() {
+  componentActiveRecursionLevels.clear();
+}
+
+void DerivedTypeCache::postComponentVisitUpdate(
+    ActiveLevels &activeRecursionLevels) {
+  if (componentActiveRecursionLevels.empty())
+    return;
+  ActiveLevels oldLevels;
+  oldLevels.swap(activeRecursionLevels);
+  std::merge(componentActiveRecursionLevels.begin(),
+             componentActiveRecursionLevels.end(), oldLevels.begin(),
+             oldLevels.end(), std::back_inserter(activeRecursionLevels));
+}
+
+void DerivedTypeCache::finalize(mlir::Type ty, mlir::LLVM::DITypeAttr attr,
+                                ActiveLevels &&activeRecursionLevels) {
+  // If there is no nested recursion or if this type does not point to any type
+  // nodes above it, it is safe to cache it indefinitely (it can be used in any
+  // contexts).
+  if (activeRecursionLevels.empty() ||
+      (activeRecursionLevels[0] == derivedTypeDepth)) {
+    typeCache[ty] = std::pair<mlir::LLVM::DITypeAttr, ActiveLevels>(attr, {});
+    componentActiveRecursionLevels.clear();
+    cleanUpCache(derivedTypeDepth);
+    --derivedTypeDepth;
+    return;
+  }
+  // Trim any recursion below the current type.
+  if (activeRecursionLevels.back() >= derivedTypeDepth) {
+    auto last = llvm::find_if(activeRecursionLevels, [&](std::int32_t depth) {
+      return depth >= derivedTypeDepth;
+    });
+    if (last != activeRecursionLevels.end()) {
+      activeRecursionLevels.erase(last, activeRecursionLevels.end());
+    }
+  }
+  componentActiveRecursionLevels = std::move(activeRecursionLevels);
+  typeCache[ty] = std::pair<mlir::LLVM::DITypeAttr, ActiveLevels>(
+      attr, componentActiveRecursionLevels);
+  cleanUpCache(derivedTypeDepth);
+  if (!componentActiveRecursionLevels.empty())
+    insertCacheCleanUp(ty, componentActiveRecursionLevels.back());
+  --derivedTypeDepth;
+}
+
+void DerivedTypeCache::insertCacheCleanUp(mlir::Type type, int32_t depth) {
+  auto iter = llvm::find_if(cacheCleanupList,
+                            [&](const auto &x) { return x.second >= depth; });
+  if (iter == cacheCleanupList.end()) {
+    cacheCleanupList.emplace_back(
+        std::pair<llvm::SmallVector<mlir::Type>, int32_t>({type}, depth));
+    return;
+  }
+  if (iter->second == depth) {
+    iter->first.push_back(type);
+    return;
+  }
+  cacheCleanupList.insert(
+      iter, std::pair<llvm::SmallVector<mlir::Type>, int32_t>({type}, depth));
+}
+
+void DerivedTypeCache::cleanUpCache(int32_t depth) {
+  if (cacheCleanupList.empty())
+    return;
+  // cleanups are done in the post actions when visiting a derived type
+  // tree. So if there is a clean-up for the current depth, it has to be
+  // the last one (deeper ones must have been done already).
+  if (cacheCleanupList.back().second == depth) {
+    for (mlir::Type type : cacheCleanupList.back().first)
+      typeCache[type].first = nullptr;
+    cacheCleanupList.pop_back_n(1);
+  }
+}
+
 mlir::LLVM::DITypeAttr DebugTypeGenerator::convertRecordType(
     fir::RecordType Ty, mlir::LLVM::DIFileAttr fileAttr,
     mlir::LLVM::DIScopeAttr scope, fir::cg::XDeclareOp declOp) {
-  // Check if this type has already been converted.
-  auto iter = typeCache.find(Ty);
-  if (iter != typeCache.end())
-    return iter->second;
 
-  bool canCacheThisType = true;
-  llvm::SmallVector<mlir::LLVM::DINodeAttr> elements;
+  if (mlir::LLVM::DITypeAttr attr = derivedTypeCache.lookup(Ty))
+    return attr;
+
   mlir::MLIRContext *context = module.getContext();
-  auto recId = mlir::DistinctAttr::create(mlir::UnitAttr::get(context));
+  auto [nameKind, sourceName] = fir::NameUniquer::deconstruct(Ty.getName());
+  if (nameKind != fir::NameUniquer::NameKind::DERIVED_TYPE)
+    return genPlaceholderType(context);
+
+  llvm::SmallVector<mlir::LLVM::DINodeAttr> elements;
   // Generate a place holder TypeAttr which will be used if a member
   // references the parent type.
-  auto comAttr = mlir::LLVM::DICompositeTypeAttr::get(
+  auto recId = mlir::DistinctAttr::create(mlir::UnitAttr::get(context));
+  auto placeHolder = mlir::LLVM::DICompositeTypeAttr::get(
       context, recId, /*isRecSelf=*/true, llvm::dwarf::DW_TAG_structure_type,
       mlir::StringAttr::get(context, ""), fileAttr, /*line=*/0, scope,
       /*baseType=*/nullptr, mlir::LLVM::DIFlags::Zero, /*sizeInBits=*/0,
       /*alignInBits=*/0, elements, /*dataLocation=*/nullptr, /*rank=*/nullptr,
       /*allocated=*/nullptr, /*associated=*/nullptr);
-  typeCache[Ty] = comAttr;
-
-  auto result = fir::NameUniquer::deconstruct(Ty.getName());
-  if (result.first != fir::NameUniquer::NameKind::DERIVED_TYPE)
-    return genPlaceholderType(context);
+  DerivedTypeCache::ActiveLevels nestedRecursions =
+      derivedTypeCache.startTranslating(Ty, placeHolder);
 
   fir::TypeInfoOp tiOp = symbolTable->lookup<fir::TypeInfoOp>(Ty.getName());
   unsigned line = (tiOp) ? getLineFromLoc(tiOp.getLoc()) : 1;
@@ -305,6 +400,7 @@ mlir::LLVM::DITypeAttr DebugTypeGenerator::convertRecordType(
   mlir::IntegerType intTy = mlir::IntegerType::get(context, 64);
   std::uint64_t offset = 0;
   for (auto [fieldName, fieldTy] : Ty.getTypeList()) {
+    derivedTypeCache.preComponentVisitUpdate();
     auto [byteSize, byteAlign] = getFieldSizeAndAlign(fieldTy);
     std::optional<llvm::ArrayRef<int64_t>> lowerBounds =
         fir::getComponentLowerBoundsIfNonDefault(Ty, fieldName, module,
@@ -317,7 +413,7 @@ mlir::LLVM::DITypeAttr DebugTypeGenerator::convertRecordType(
     mlir::LLVM::DITypeAttr elemTy;
     if (lowerBounds && seqTy &&
         lowerBounds->size() == seqTy.getShape().size()) {
-      llvm::SmallVector<mlir::LLVM::DINodeAttr> elements;
+      llvm::SmallVector<mlir::LLVM::DINodeAttr> arrayElements;
       for (auto [bound, dim] :
            llvm::zip_equal(*lowerBounds, seqTy.getShape())) {
         auto countAttr = mlir::IntegerAttr::get(intTy, llvm::APInt(64, dim));
@@ -325,14 +421,14 @@ mlir::LLVM::DITypeAttr DebugTypeGenerator::convertRecordType(
         auto subrangeTy = mlir::LLVM::DISubrangeAttr::get(
             context, countAttr, lowerAttr, /*upperBound=*/nullptr,
             /*stride=*/nullptr);
-        elements.push_back(subrangeTy);
+        arrayElements.push_back(subrangeTy);
       }
       elemTy = mlir::LLVM::DICompositeTypeAttr::get(
           context, llvm::dwarf::DW_TAG_array_type, /*name=*/nullptr,
           /*file=*/nullptr, /*line=*/0, /*scope=*/nullptr,
           convertType(seqTy.getEleTy(), fileAttr, scope, declOp),
           mlir::LLVM::DIFlags::Zero, /*sizeInBits=*/0, /*alignInBits=*/0,
-          elements, /*dataLocation=*/nullptr, /*rank=*/nullptr,
+          arrayElements, /*dataLocation=*/nullptr, /*rank=*/nullptr,
           /*allocated=*/nullptr, /*associated=*/nullptr);
     } else
       elemTy = convertType(fieldTy, fileAttr, scope, /*declOp=*/nullptr);
@@ -344,80 +440,18 @@ mlir::LLVM::DITypeAttr DebugTypeGenerator::convertRecordType(
         /*extra data=*/nullptr);
     elements.push_back(tyAttr);
     offset += llvm::alignTo(byteSize, byteAlign);
-
-    // Currently, the handling of recursive debug type in mlir has some
-    // limitations that were discussed at the end of the thread for following
-    // PR.
-    // https://github.com/llvm/llvm-project/pull/106571
-    //
-    // Problem could be explained with the following example code:
-    //  type t2
-    //   type(t1), pointer :: p1
-    // end type
-    // type t1
-    //   type(t2), pointer :: p2
-    // end type
-    // In the description below, type_self means a temporary type that is
-    // generated
-    // as a place holder while the members of that type are being processed.
-    //
-    // If we process t1 first then we will have the following structure after
-    // it has been processed.
-    // t1 -> t2 -> t1_self
-    // This is because when we started processing t2, we did not have the
-    // complete t1 but its place holder t1_self.
-    // Now if some entity requires t2, we will already have that in cache and
-    // will return it. But this t2 refers to t1_self and not to t1. In mlir
-    // handling, only those types are allowed to have _self reference which are
-    // wrapped by entity whose reference it is. So t1 -> t2 -> t1_self is ok
-    // because the t1_self reference can be resolved by the outer t1. But
-    // standalone t2 is not because there will be no way to resolve it. Until
-    // this is fixed in mlir, we avoid caching such types. Please see
-    // DebugTranslation::translateRecursive for details on how mlir handles
-    // recursive types.
-    // The code below checks for situation where it will be unsafe to cache
-    // a type to avoid this problem. We do that in 2 situations.
-    // 1. If a member is record type, then its type would have been processed
-    // before reaching here. If it is not in the cache, it means that it was
-    // found to be unsafe to cache. So any type containing it will also not
-    // be cached
-    // 2. The type of the member is found in the cache but it is a place holder.
-    // In this case, its recID should match the recID of the type we are
-    // processing. This helps us to cache the following type.
-    // type t
-    //  type(t), allocatable :: p
-    // end type
-    mlir::Type baseTy = getDerivedType(fieldTy);
-    if (auto recTy = mlir::dyn_cast<fir::RecordType>(baseTy)) {
-      auto iter = typeCache.find(recTy);
-      if (iter == typeCache.end())
-        canCacheThisType = false;
-      else {
-        if (auto tyAttr =
-                mlir::dyn_cast<mlir::LLVM::DICompositeTypeAttr>(iter->second)) {
-          if (tyAttr.getIsRecSelf() && tyAttr.getRecId() != recId)
-            canCacheThisType = false;
-        }
-      }
-    }
+    derivedTypeCache.postComponentVisitUpdate(nestedRecursions);
   }
 
   auto finalAttr = mlir::LLVM::DICompositeTypeAttr::get(
       context, recId, /*isRecSelf=*/false, llvm::dwarf::DW_TAG_structure_type,
-      mlir::StringAttr::get(context, result.second.name), fileAttr, line, scope,
+      mlir::StringAttr::get(context, sourceName.name), fileAttr, line, scope,
       /*baseType=*/nullptr, mlir::LLVM::DIFlags::Zero, offset * 8,
       /*alignInBits=*/0, elements, /*dataLocation=*/nullptr, /*rank=*/nullptr,
       /*allocated=*/nullptr, /*associated=*/nullptr);
 
-  // derivedTypeDepth == 1 means that it is a top level type which is safe to
-  // cache.
-  if (canCacheThisType || derivedTypeDepth == 1) {
-    typeCache[Ty] = finalAttr;
-  } else {
-    auto iter = typeCache.find(Ty);
-    if (iter != typeCache.end())
-      typeCache.erase(iter);
-  }
+  derivedTypeCache.finalize(Ty, finalAttr, std::move(nestedRecursions));
+
   return finalAttr;
 }
 
@@ -425,15 +459,18 @@ mlir::LLVM::DITypeAttr DebugTypeGenerator::convertTupleType(
     mlir::TupleType Ty, mlir::LLVM::DIFileAttr fileAttr,
     mlir::LLVM::DIScopeAttr scope, fir::cg::XDeclareOp declOp) {
   // Check if this type has already been converted.
-  auto iter = typeCache.find(Ty);
-  if (iter != typeCache.end())
-    return iter->second;
+  if (mlir::LLVM::DITypeAttr attr = derivedTypeCache.lookup(Ty))
+    return attr;
+
+  DerivedTypeCache::ActiveLevels nestedRecursions =
+      derivedTypeCache.startTranslating(Ty);
 
   llvm::SmallVector<mlir::LLVM::DINodeAttr> elements;
   mlir::MLIRContext *context = module.getContext();
 
   std::uint64_t offset = 0;
   for (auto fieldTy : Ty.getTypes()) {
+    derivedTypeCache.preComponentVisitUpdate();
     auto [byteSize, byteAlign] = getFieldSizeAndAlign(fieldTy);
     mlir::LLVM::DITypeAttr elemTy =
         convertType(fieldTy, fileAttr, scope, /*declOp=*/nullptr);
@@ -445,6 +482,7 @@ mlir::LLVM::DITypeAttr DebugTypeGenerator::convertTupleType(
         /*extra data=*/nullptr);
     elements.push_back(tyAttr);
     offset += llvm::alignTo(byteSize, byteAlign);
+    derivedTypeCache.postComponentVisitUpdate(nestedRecursions);
   }
 
   auto typeAttr = mlir::LLVM::DICompositeTypeAttr::get(
@@ -453,7 +491,7 @@ mlir::LLVM::DITypeAttr DebugTypeGenerator::convertTupleType(
       /*baseType=*/nullptr, mlir::LLVM::DIFlags::Zero, offset * 8,
       /*alignInBits=*/0, elements, /*dataLocation=*/nullptr, /*rank=*/nullptr,
       /*allocated=*/nullptr, /*associated=*/nullptr);
-  typeCache[Ty] = typeAttr;
+  derivedTypeCache.finalize(Ty, typeAttr, std::move(nestedRecursions));
   return typeAttr;
 }
 
@@ -667,27 +705,7 @@ DebugTypeGenerator::convertType(mlir::Type Ty, mlir::LLVM::DIFileAttr fileAttr,
     return convertCharacterType(charTy, fileAttr, scope, declOp,
                                 /*hasDescriptor=*/false);
   } else if (auto recTy = mlir::dyn_cast_if_present<fir::RecordType>(Ty)) {
-    // For nested derived types like shown below, the call sequence of the
-    // convertRecordType will look something like as follows:
-    // convertRecordType (t1)
-    //  convertRecordType (t2)
-    //    convertRecordType (t3)
-    // We need to recognize when we are processing the top level type like t1
-    // to make caching decision. The variable `derivedTypeDepth` is used for
-    // this purpose and maintains the current depth of derived type processing.
-    //  type t1
-    //   type(t2), pointer :: p1
-    // end type
-    // type t2
-    //   type(t3), pointer :: p2
-    // end type
-    // type t2
-    //   integer a
-    // end type
-    derivedTypeDepth++;
-    auto result = convertRecordType(recTy, fileAttr, scope, declOp);
-    derivedTypeDepth--;
-    return result;
+    return convertRecordType(recTy, fileAttr, scope, declOp);
   } else if (auto tupleTy = mlir::dyn_cast_if_present<mlir::TupleType>(Ty)) {
     return convertTupleType(tupleTy, fileAttr, scope, declOp);
   } else if (auto refTy = mlir::dyn_cast_if_present<fir::ReferenceType>(Ty)) {

diff  --git a/flang/lib/Optimizer/Transforms/DebugTypeGenerator.h b/flang/lib/Optimizer/Transforms/DebugTypeGenerator.h
index 93b9ac2d90fdf..854e4397ca32b 100644
--- a/flang/lib/Optimizer/Transforms/DebugTypeGenerator.h
+++ b/flang/lib/Optimizer/Transforms/DebugTypeGenerator.h
@@ -23,6 +23,77 @@
 
 namespace fir {
 
+/// Special cache to deal with the fact that mlir::LLVM::DITypeAttr for
+/// derived types may only be valid in specific nesting contexts in presence
+/// of derived type recursion and cannot be cached for the whole compilation.
+/// It is however still desirable to cache such mlir::LLVM::DITypeAttr as
+/// long as possible to avoid catastrophic compilation slow downs in very
+/// complex derived types where an intermediate type in a derived type cycle may
+/// indirectly appear hundreds of times under the top type of the derived type
+/// cycle. More details in the comment below.
+class DerivedTypeCache {
+public:
+  // Currently, the handling of recursive debug type in mlir has some
+  // limitations that were discussed at the end of the thread for following
+  // PR.
+  // https://github.com/llvm/llvm-project/pull/106571
+  //
+  // Problem could be explained with the following example code:
+  //  type t2
+  //   type(t1), pointer :: p1
+  // end type
+  // type t1
+  //   type(t2), pointer :: p2
+  // end type
+  // In the description below, type_self means a temporary type that is
+  // generated
+  // as a place holder while the members of that type are being processed.
+  //
+  // If we process t1 first then we will have the following structure after
+  // it has been processed.
+  // t1 -> t2 -> t1_self
+  // This is because when we started processing t2, we did not have the
+  // complete t1 but its place holder t1_self.
+  // Now if some entity requires t2, we will already have that in cache and
+  // will return it. But this t2 refers to t1_self and not to t1. In mlir
+  // handling, only those types are allowed to have _self reference which are
+  // wrapped by entity whose reference it is. So t1 -> t2 -> t1_self is ok
+  // because the t1_self reference can be resolved by the outer t1. But
+  // standalone t2 is not because there will be no way to resolve it. Until
+  // this is fixed in mlir, we avoid caching such types. Please see
+  // DebugTranslation::translateRecursive for details on how mlir handles
+  // recursive types.
+  using ActiveLevels = llvm::SmallVector<int32_t, 1>;
+  mlir::LLVM::DITypeAttr lookup(mlir::Type);
+  ActiveLevels startTranslating(mlir::Type,
+                                mlir::LLVM::DITypeAttr placeHolder = nullptr);
+  void finalize(mlir::Type, mlir::LLVM::DITypeAttr, ActiveLevels &&);
+  void preComponentVisitUpdate();
+  void postComponentVisitUpdate(ActiveLevels &);
+
+private:
+  void insertCacheCleanUp(mlir::Type type, int32_t depth);
+  void cleanUpCache(int32_t depth);
+  // Current depth inside a top level derived type being converted.
+  int32_t derivedTypeDepth = 0;
+  // Cache for already translated derived types with the minimum depth where
+  // this cache entry is valid. Zero means the translation is always valid, "i"
+  // means the type depends its derived type tree parent node at depth "i". Such
+  // types should be cleaned-up from the cache in the post visit of node "i".
+  // Note that any new metadata created for a type with a component in the cache
+  // with validity of "i" shall not be added to the cache with a validity
+  // smaller than "i".
+  llvm::DenseMap<mlir::Type, std::pair<mlir::LLVM::DITypeAttr, ActiveLevels>>
+      typeCache;
+  // List of parent nodes that are being recursively referred to in the
+  // component type that has just been computed.
+  ActiveLevels componentActiveRecursionLevels;
+  // Helper list that maintains the list of nodes that must be deleted from the
+  // cache when going back past listed parent depths.
+  llvm::SmallVector<std::pair<llvm::SmallVector<mlir::Type>, int32_t>>
+      cacheCleanupList;
+};
+
 /// This converts FIR/mlir type to DITypeAttr.
 class DebugTypeGenerator {
 public:
@@ -91,8 +162,7 @@ class DebugTypeGenerator {
   std::uint64_t lenOffset;
   std::uint64_t rankOffset;
   std::uint64_t rankSize;
-  int32_t derivedTypeDepth;
-  llvm::DenseMap<mlir::Type, mlir::LLVM::DITypeAttr> typeCache;
+  DerivedTypeCache derivedTypeCache;
 };
 
 } // namespace fir

diff  --git a/flang/test/Integration/debug-cyclic-derived-type-caching-complex.f90 b/flang/test/Integration/debug-cyclic-derived-type-caching-complex.f90
new file mode 100644
index 0000000000000..72b7a78bef510
--- /dev/null
+++ b/flang/test/Integration/debug-cyclic-derived-type-caching-complex.f90
@@ -0,0 +1,116 @@
+! RUN: %flang_fc1 -emit-llvm -debug-info-kind=standalone %s -o - | FileCheck  %s
+
+! Test that debug metadata for derived types is not duplicated more than needed
+! when emitting debug info with non trivial cycles.
+
+
+! In the type graph below, G has a back edge to B, and F to D.
+! This causes C to be in the middle of B cycle, and E to be
+! both in B and D cycles.
+! C and E are used in several contexts, under B, under D, and outside
+! of it to test how metadata is generated for them.
+!
+! Without "local caching" of C and E when generating mlir::LLVM::DITypeAttr
+! for such derived types, many duplicate llvm metadata for the derived types
+! would be emitted, while with the right duplication of mlir::LLVM::DITypeAttr,
+! a lot more duplicate llvm IR metadata ends up emitted (19 DICompositeType
+! vs 71 before the patch that added this test).
+!
+!
+!  A -> B -> C -> D -> E -> F -> G -> B
+!  |    |         |         |
+!  |    |         |         | -> D
+!  |    |         |
+!  |    |         | -> H -> E
+!  |    |
+!  |    | -> I -> E
+!  |         | -> C
+!  |
+!  | -> C
+!  | -> E
+
+subroutine type_cycles_caching()
+  type g
+    type(b), pointer :: c_b
+  end type
+  type f
+    type(g) :: c_b
+    type(d), pointer :: c_d
+  end type
+  type e
+    type(f) :: c_f
+  end type
+    type h
+      ! Can reuse metadata of type(e) under 'd'.
+      type(e) :: c_e
+    end type
+  type d
+    type(e) :: c_e
+    type(h) :: c_h
+  end type
+  type c
+    type(d) :: c_d
+  end type
+    type i
+      ! Cannot reuse metadata of type(e) under 'd'.
+      type(e) :: c_e
+      ! Can reuse metadata of type(c) under 'b'.
+      type(c) :: c_c
+    end type
+  type b
+    type(c) :: c_c
+    type(i) :: c_i
+  end type
+  type a
+    type(b) :: c_b
+    ! Cannot reuse metadata of type(c) under 'b'
+    type(c) :: c_c
+    ! Cannot reuse metadata of type(e) under 'd', nor the one under 'b'
+    type(e) :: c_e
+  end type
+  type(a) :: xa
+  ! Can reuse metadata of type(c) for xa%c_c
+  type(c) :: xc
+  ! Can reuse metadata of type(c) for xa%c_e
+  type(e) :: xe
+  call bar(xa, xc, xe)
+end subroutine
+
+! CHECK: distinct !DICompositeType(tag: DW_TAG_structure_type, name: "a",
+! CHECK-NOT: distinct !DICompositeType
+! CHECK: distinct !DICompositeType(tag: DW_TAG_structure_type, name: "b",
+! CHECK-NOT: distinct !DICompositeType
+! CHECK: distinct !DICompositeType(tag: DW_TAG_structure_type, name: "c",
+! CHECK-NOT: distinct !DICompositeType
+! CHECK: distinct !DICompositeType(tag: DW_TAG_structure_type, name: "d",
+! CHECK-NOT: distinct !DICompositeType
+! CHECK: distinct !DICompositeType(tag: DW_TAG_structure_type, name: "e",
+! CHECK-NOT: distinct !DICompositeType
+! CHECK: distinct !DICompositeType(tag: DW_TAG_structure_type, name: "f",
+! CHECK-NOT: distinct !DICompositeType
+! CHECK: distinct !DICompositeType(tag: DW_TAG_structure_type, name: "g",
+! CHECK-NOT: distinct !DICompositeType
+! CHECK: distinct !DICompositeType(tag: DW_TAG_structure_type, name: "h",
+! CHECK-NOT: distinct !DICompositeType
+! CHECK: distinct !DICompositeType(tag: DW_TAG_structure_type, name: "i",
+! CHECK-NOT: distinct !DICompositeType
+! CHECK: distinct !DICompositeType(tag: DW_TAG_structure_type, name: "e"
+! CHECK-NOT: distinct !DICompositeType
+! CHECK: distinct !DICompositeType(tag: DW_TAG_structure_type, name: "f"
+! CHECK-NOT: distinct !DICompositeType
+! CHECK: distinct !DICompositeType(tag: DW_TAG_structure_type, name: "c"
+! CHECK-NOT: distinct !DICompositeType
+! CHECK: distinct !DICompositeType(tag: DW_TAG_structure_type, name: "d"
+! CHECK-NOT: distinct !DICompositeType
+! CHECK: distinct !DICompositeType(tag: DW_TAG_structure_type, name: "e"
+! CHECK-NOT: distinct !DICompositeType
+! CHECK: distinct !DICompositeType(tag: DW_TAG_structure_type, name: "f"
+! CHECK-NOT: distinct !DICompositeType
+! CHECK: distinct !DICompositeType(tag: DW_TAG_structure_type, name: "g"
+! CHECK-NOT: distinct !DICompositeType
+! CHECK: distinct !DICompositeType(tag: DW_TAG_structure_type, name: "h"
+! CHECK-NOT: distinct !DICompositeType
+! CHECK: distinct !DICompositeType(tag: DW_TAG_structure_type, name: "e"
+! CHECK-NOT: distinct !DICompositeType
+! CHECK: distinct !DICompositeType(tag: DW_TAG_structure_type, name: "f"
+! CHECK-NOT: distinct !DICompositeType

diff  --git a/flang/test/Integration/debug-cyclic-derived-type-caching-simple.f90 b/flang/test/Integration/debug-cyclic-derived-type-caching-simple.f90
new file mode 100644
index 0000000000000..c1f49d2f4a74a
--- /dev/null
+++ b/flang/test/Integration/debug-cyclic-derived-type-caching-simple.f90
@@ -0,0 +1,39 @@
+! RUN: %flang_fc1 -emit-llvm -debug-info-kind=standalone %s -o - | FileCheck  %s
+
+! Simple test that checks that metadata for `t0` is only duplicated once.
+
+! The 
diff iculty is that at the mlir::LLVM::DITypeAttr, because of the
+! lack of MLIR attribute true recursion, the mlir::LLVM::DITypeAttr for
+! `t0` inside `t1` is special because it is only valid when found
+! in an mlir::LLVM::DITypeAttr tree under `base` (it will point to `base`
+! via an integer id that is only meanigfu;l when a node with such id has
+! been traversed).
+! A 
diff erent node has to be created for `t0` usage in `x0` (will
+! point to the actual mlir::LLVM::DITypeAttr for `base` instead of
+! an integer id since the cycle is already "taken care of" in `base`
+! definition).
+! However, the same special `t0` node can be used for both `x1_1` and
+! `x1_2` components because they are both under `base` in the
+! mlir::LLVM::DITypeAttr tree definition. This used to not be the case,
+! leading to a lot of duplicate mlir::LLVM::DITypeAttr and actual LLVM IR
+! metadata, causing noticeable compilation slowdowns in apps with non trivial
+! derived types.
+
+subroutine duplicate_cycle_branch()
+  type t0
+    type(base), pointer :: x
+  end type
+  type t1
+    type(t0) :: x1_1
+    type(t0) :: x1_2
+  end type
+  type base
+    type(t1) :: x_base
+  end type
+  type(base) :: x
+  type(t0) :: x0
+  call bar(x, x0)
+end subroutine
+! CHECK: distinct !DICompositeType(tag: DW_TAG_structure_type, name: "t0",
+! CHECK: distinct !DICompositeType(tag: DW_TAG_structure_type, name: "t0",
+! CHECK-NOT: distinct !DICompositeType(tag: DW_TAG_structure_type, name: "t0",