[clang] Implement `-dump-deserialized-declaration-ranges` flag. (PR #133910)

Tue Apr 8 09:12:39 PDT 2025

================
@@ -49,6 +54,185 @@ LLVM_INSTANTIATE_REGISTRY(FrontendPluginRegistry)
 
 namespace {
 
+/// DeserializedDeclsLineRangePrinter dumps ranges of deserialized declarations
+/// to aid debugging and bug minimization. It implements ASTConsumer and
+/// ASTDeserializationListener, so that an object of
+/// DeserializedDeclsLineRangePrinter registers as its own listener. The
+/// ASTDeserializationListener interface provides the DeclRead callback that we
+/// use to collect the deserialized Decls. Note that printing or otherwise
+/// processing them as this point is dangerous, since that could trigger
+/// additional deserialization and crash compilation. Therefore, we process the
+/// collected Decls in HandleTranslationUnit method of ASTConsumer. This is a
+/// safe point, since we know that by this point all the Decls needed by the
+/// compiler frontend have been deserialized. In case our processing causes
+/// further deserialization, DeclRead from the listener might be called again.
+/// However, at that point we don't accept any more Decls for processing.
+class DeserializedDeclsLineRangePrinter : public ASTConsumer,
+                                          ASTDeserializationListener {
+public:
+  explicit DeserializedDeclsLineRangePrinter(
+      SourceManager &SM, std::unique_ptr<llvm::raw_fd_ostream> OS)
+      : ASTDeserializationListener(), SM(SM), OS(std::move(OS)) {}
+
+  ASTDeserializationListener *GetASTDeserializationListener() override {
+    return this;
+  }
+
+  void DeclRead(GlobalDeclID ID, const Decl *D) override {
+    if (!IsCollectingDecls)
+      return;
+    if (!D || isa<TranslationUnitDecl>(D) || isa<LinkageSpecDecl>(D) ||
+        isa<NamespaceDecl>(D)) {
+      // These decls cover a lot of nested declarations that might not be used,
+      // reducing the granularity and making the output less useful.
+      return;
+    }
+    if (auto *DC = D->getDeclContext(); !DC || !DC->isFileContext()) {
+      // We choose to work at namespace level to reduce complexity and the
+      // number of cases we care about.
+      return;
+    }
+    PendingDecls.push_back(D);
+  }
+
+  struct Position {
+    unsigned Line;
+    unsigned Column;
+
+    bool operator<(const Position &other) const {
+      if (Line < other.Line)
+        return true;
+      if (Line > other.Line)
+        return false;
+      return Column < other.Column;
+    }
+
+    static Position GetBeginSpelling(const SourceManager &SM,
+                                     const CharSourceRange &R) {
+      SourceLocation Begin = R.getBegin();
+      return {SM.getSpellingLineNumber(Begin),
+              SM.getSpellingColumnNumber(Begin)};
+    }
+
+    static Position GetEndSpelling(const SourceManager &SM,
+                                   const CharSourceRange &Range,
+                                   const LangOptions &LangOpts) {
+      // For token ranges, compute end location for end character of the range.
+      // The end location of returned range is exclusive.
----------------
emaxx-google wrote:

If we want to completely mark a line having N characters, will it point to column N+1, or will it point to column 1 of the next line? The notion of "exclusive" is not obvious from this perspective, and the difference is that the latter includes the newline character. The former looks slightly more intuitive to me, but ultimately the minimizer can work with either, as long as it knows what to expect.

https://github.com/llvm/llvm-project/pull/133910