[clang] [C++20][Modules] Do not update the declaration generation number if the redeclaration chain completion was delayed. (PR #129982)

Michael Park via cfe-commits cfe-commits at lists.llvm.org
Fri Mar 14 14:34:13 PDT 2025


mpark wrote:

Alright folks, I've finally figured this out! I'll describe the high-level problem, what's happening specifically in the test case, and the solution.

# High-Level Problem

[`ASTReader::FinishedDeserializing`](https://github.com/llvm/llvm-project/blob/release/20.x/clang/lib/Serialization/ASTReader.cpp#L10788) uses [`NumCurrentElementsDeserializing`](https://github.com/llvm/llvm-project/blob/release/20.x/clang/include/clang/Serialization/ASTReader.h#L1176-L1177) to keep track of nested [`Deserializing`](https://github.com/llvm/llvm-project/blob/e6382f2111353f5af66bb660c2e0317c21c398ed/clang/include/clang/AST/ExternalASTSource.h#L76-L90) RAII actions. The `FinishedDeserializing` only performs actions if it is the top-level `Deserializing` layer. This works fine in general, but there is a problematic edge case.

If a call to [`redecls()`](https://github.com/llvm/llvm-project/blob/release/20.x/clang/lib/Serialization/ASTReader.cpp#L10815) in `FinishedDeserializing` performs deserialization, we re-enter `FinishedDeserializing` while in the middle of the previous `FinishedDeserializing` call.

It looks something like:

```
      +-- ... 2 ... --+
      |               |
   +--+       1       +--+     +-----+
   |  ^SD2         FD2^  |     |     |
|--+          0          +-----+-----+-------|
   ^SD1               FD1^     ^SD3  ^FD3 (FD1 still in progress)
```

The known problematic part of this is that this inner `FinishedDeserializing` (`FD3` in the diagram) can go all the way to [`PassInterestingDeclsToConsumer`](https://github.com/llvm/llvm-project/blob/release/20.x/clang/lib/Serialization/ASTReader.cpp#L10844), which operates on [`PotentiallyInterestingDecls`](https://github.com/llvm/llvm-project/blob/release/20.x/clang/include/clang/Serialization/ASTReader.h#L1202-L1208) data structure which contain decls that must be handled by the `FD1` stage.

The other shared data structures are also somewhat concerning at a high-level in that the inner `FinishedDeserializing` would be handling pending actions that are not "within its scope", but this part is not known to be problematic.

# Specifics

We perform a look-up on `f` for the `f(0)` call where `f` is an overloaded function. One of which is `void f(const N::S &) {}`, where `S = BS<Empty<char>>`. In `finishPendingActions`, we get to [`RD->addedMember(MD);`](https://github.com/llvm/llvm-project/blob/release/20.x/clang/lib/Serialization/ASTReader.cpp#L10478) where `RD` is the `ClassSpecializationDecl` of `Empty<char>` and `MD` is the destructor of `Empty<char>`.

Without #121245, the `ClassSpecializationDecl` of `Empty<char>` is left in the `PendingIncompleteDeclChains` data structure as we exit `finishPendingActions`. We get through the rest of `FinishedDeserializing`, and gets to `PassInterestingDeclsToConsumer`. `void f(const N::S &)` is identified as an interesting decl and gets name mangled. During name-mangling, we call `redecls()` on `N`, `__1`, `BS`, and `Empty` (since `N::S` is really `N::__1::BS<Empty<char>>`) and everything is good.

With #121245, the `ClassSpecializationDecl` of `Empty<char>` gets marked incomplete as we exit `finishPendingActions`. In the second part of `FinishedDeserializing`, the call to [`Update.second->redecls()`](https://github.com/llvm/llvm-project/blob/release/20.x/clang/lib/Serialization/ASTReader.cpp#L10815) is made where `Update.second` is the `ClassSpecializationDecl` of `Empty<char>`. This time, since `Empty<char>` is marked incomplete, redecl chain completion logic is triggered.

Since we're in the `NumCurrentElementsDeserializing == 0` part of `FinishedDeserializing`, we do not merely add to `PendingIncompleteDeclChains`, but rather get to the [`DC->lookup(Name)`](https://github.com/llvm/llvm-project/blob/release/20.x/clang/lib/Serialization/ASTReader.cpp#L7860) call in `ASTReader::CompleteRedeclChain` with `DC = __1` and `Name = "Empty"`.

The implementation of `DC->lookup` ensures that the `DC` is up-to-date by invoking [`getMostRecentDecl()` on `DC`](https://github.com/llvm/llvm-project/blob/release/20.x/clang/lib/AST/DeclBase.cpp#L1884-L1889). This happens recursively for `DC = __1` and `Name = "Empty"`, then `DC = N` and `Name = "__1"`. So now, `N` just completed its redecl chain, and [`FindExternalVisibleDeclsByName`](https://github.com/llvm/llvm-project/blob/c6d95c441a29a45782ff72d6cb82839b86fd0e4a/clang/lib/Serialization/ASTReader.cpp#L8478-L8527) is called from [here](https://github.com/llvm/llvm-project/blob/release/20.x/clang/lib/AST/DeclBase.cpp#L1913) with `DC = N` and `Name = "__1"`. Note that at this point, we haven't gotten to actually performing `FindExternalVisibleDeclsByName` with `DC = __1` and `Name = "Empty"` which will populate new declarations/definitions to `__1`. There is an instance of `Deserializing` inside `FindExternalVisibleDeclsByName` [here](https://github.com/llvm/llvm-project/blob/c6d95c441a29a45782ff72d6cb82839b86fd0e4a/clang/lib/Serialization/ASTReader.cpp#L8490), end of which of course is an invocation to `FinishedDeserializing`. This call gets to `PassInterestingDeclsToConsumer` with `void f(const N::S &)` inside of it from before, and tries to mangle the name. During name-mangling, we call `redecls()` on `N`, `__1`, `BS`, and `Empty` as is without #121245, but this time we're in the middle of a recursive `FinishedDeserializing` invocation, where the inner `PassInterestingDeclsToConsumer` is processing entries that are not ready to be processed because we're still in the middle of preparing them.

# Solutions

As described in the High-level Problem section, the big picture problem is the recursive nature of `FinishedDeserializing` where we can perform further deserialization within `FinishedDeserializing`, and the inner `FinishedDeserializing` call operates on the shared data structures such as `Pending*` and `PotentiallyInterestingDecls`.

## Proposed Solution

We already have a guard within `PassInterestingDeclsToConsumer` because we can end up with recursive deserialization within `PassInterestingDeclsToConsumer`. The proposed solution is to apply this guard to the portion of `FinishedDeserializing` that performs further deserialization as well. This ensures that recursive deserialization does not trigger `PassInterestingDeclsToConsumer` which may operate on entries that not ready to be passed.

## Alternative: Save and restore the `PotentiallyInterestingDecls`

An alternative approach would be to save the `PotentiallyInterestingDecls` on the side while recursive deserialization is happening, such that we **do** invoke `PassInterestingDeclsToConsumer`, but only with the decls that are within its scope. This approach works, and has been tested. The reason this approach is not taken is because in this approach in principle, we should also save other shared data structures on the side as well, and the proposed solution fit in nicer with the existing mechanism.

## Alternative: Guard recursive entering of the second part of `FinishedDeserializing`

This approach adds a new field `bool FinishingDeserializing = false;` to `ASTReader`, and guards the `NumCurrentElementsDeserializing == 0` portion of the `FinishedDeserializing` like so:
```diff
  if (NumCurrentElementsDeserializing == 0) {
+   if (FinishingDeserializing)
+     return;

+   // Guard variable to avoid recursively redoing the process of passing
+   // decls to consumer.
+   SaveAndRestore GuardFinishingDeserializing(FinishingDeserializing, true);

    while (!PendingExceptionSpecUpdates.empty() ||
           !PendingDeducedTypeUpdates.empty() ||
           !PendingUndeducedFunctionDecls.empty()) {
      // ...
    }

    if (ReadTimer)
      ReadTimer->stopTimer();

    diagnoseOdrViolations();

    // We are not in recursive loading, so it's safe to pass the "interesting"
    // decls to the consumer.
    if (Consumer)
      PassInterestingDeclsToConsumer();
  }
```

This mirrors the current mechanism of the [`PassingDeclsToConsumer` guard](https://github.com/llvm/llvm-project/blob/c6d95c441a29a45782ff72d6cb82839b86fd0e4a/clang/lib/Serialization/ASTReaderDecl.cpp#L4286-L4291). This approach passes the test-suite. However, I don't believe it's quite correct.

Suppose we're at the `diagnoseOdrViolations()` part in the outer `FinishedDeserializing` where a `redecls()` is invoked. We could have a recursive deserialization at that point, and with this approach, the inner `FinishedDeserializing` would simply return rather than processing the `Pending*` data structures. We can't just rely on the outer `FinishedDeserializing` to finish the job.

https://github.com/llvm/llvm-project/pull/129982


More information about the cfe-commits mailing list