[clang] [clang] Fix the crash when dumping deserialized decls (PR #133395)

Fri Mar 28 05:29:43 PDT 2025

================
@@ -103,15 +120,30 @@ class DeserializedDeclsDumper : public DelegatingDeserializationListener {
       : DelegatingDeserializationListener(Previous, DeletePrevious) {}
 
   void DeclRead(GlobalDeclID ID, const Decl *D) override {
-    llvm::outs() << "PCH DECL: " << D->getDeclKindName();
-    if (const NamedDecl *ND = dyn_cast<NamedDecl>(D)) {
-      llvm::outs() << " - ";
-      ND->printQualifiedName(llvm::outs());
+    PendingDecls.push_back(D);
+    DelegatingDeserializationListener::DeclRead(ID, D);
+  }
+  void FinishedDeserializing() override {
+    auto Decls = std::move(PendingDecls);
+    for (const auto *D : Decls) {
+      llvm::outs() << "PCH DECL: " << D->getDeclKindName();
+      if (const NamedDecl *ND = dyn_cast<NamedDecl>(D)) {
+        llvm::outs() << " - ";
+        ND->printQualifiedName(llvm::outs());
+      }
+      llvm::outs() << "\n";
     }
-    llvm::outs() << "\n";
 
-    DelegatingDeserializationListener::DeclRead(ID, D);
+    if (!PendingDecls.empty()) {
----------------
ilya-biryukov wrote:

> I tested it with the crash case, and printQualifiedName does not cause further deserialization with this change, I'm not certain now.

> I think a broader question is: once deserialization is finished, can we safely assume that using a loaded declaration will never trigger additional deserialization?

No, we cannot assume that. E.g. we can load a function without a body and requesting a body at any other point in code may cause deserialization of the body itself, all declarations it references and so on.

> after deserialization is fully completed.

Deserializations get started and completed throughout the program many times and it's generally fine.

> One possible solution is to disallow this behavior, e.g we could add an assertion in ASTReader.cpp.

I don't think this works, actually. It's very hard to write code that does not deserialize. And it's probably not necessary to actually have that level of scrutiny. Deserializing from inside the callbacks in the deserialization itself is cheesy, but deserializing more outside of the deserialization is a perfectly valid use-case.

I would recommend a different approach and instead putting it on the author of the interface to figure out when they want to process their results.
E.g. `HandleTranslationUnit` or `HandleTopLevelDecl` from `ASTConsumer` are safe places. One happens once per invocation, another one happens more often (if you need more gradual results). Some users might prefer `EndSourceFile`.

We would require wiring up other callbacks (`ASTConsumer`, the whole `FrontendAction`, etc) in addition to deserialization listener. But that's already the case, e.g.`PPCallbacks` rarely live outside something else.
So maybe just remove this callback and rely on other ways to output the buffered declarations? How does that sound? And for the problem at hand...


> I tested it with the crash case, and printQualifiedName does not cause further deserialization with this change, I'm not certain now.

So maybe it crashed simply because we did not propagate the other callbacks? If that's the case, a more narrow change that does not add more methods to the interface would be enough. Or is that not the case?

https://github.com/llvm/llvm-project/pull/133395