[clang-tools-extra] [clangd] textDocument/documentLink to support include statements with macro argument (PR #137550)

Thu Jul 31 05:08:38 PDT 2025

================
@@ -859,17 +859,56 @@ std::vector<DocumentLink> getDocumentLinks(ParsedAST &AST) {
   for (auto &Inc : AST.getIncludeStructure().MainFileIncludes) {
     if (Inc.Resolved.empty())
       continue;
+
+    // Get the location of the # symbole of the "#include ..." statement
     auto HashLoc = SM.getComposedLoc(SM.getMainFileID(), Inc.HashOffset);
+
+    // get the # Token itself, std::next to get the "include" token and the
+    // first token after (aka "File Token")
     const auto *HashTok = AST.getTokens().spelledTokenContaining(HashLoc);
     assert(HashTok && "got inclusion at wrong offset");
     const auto *IncludeTok = std::next(HashTok);
     const auto *FileTok = std::next(IncludeTok);
-    // FileTok->range is not sufficient here, as raw lexing wouldn't yield
-    // correct tokens for angled filenames. Hence we explicitly use
-    // Inc.Written's length.
-    auto FileRange =
-        syntax::FileRange(SM, FileTok->location(), Inc.Written.length())
-            .toCharRange(SM);
+
+    // The File Token can either be of kind :
+    // "less" if using the "#include <h-char-sequence> new-line" syntax
+    // "string_literal" if using the "#include "q-char-sequence" new-line"
+    // syntax something else (most likely "identifier") if using the "#include
+    // pp-tokens new-line" syntax (#include with macro argument)
+
+    CharSourceRange FileRange;
+
+    if (FileTok->kind() == tok::TokenKind::less) {
+      // FileTok->range would only include the '<' char. Hence we explicitly use
+      // Inc.Written's length.
+      FileRange =
+          syntax::FileRange(SM, FileTok->location(), Inc.Written.length())
+              .toCharRange(SM);
+    } else if (FileTok->kind() == tok::TokenKind::string_literal) {
+      // FileTok->range includes the quotes for string literals so just return
+      // it.
+      FileRange = FileTok->range(SM).toCharRange(SM);
+    } else {
+      // FileTok is the first Token of a macro spelling
+      // We can use the AST to get the macro expansion from the spelling
+      // starting at FileTok and use the expansion to get all the spelled Tokens
+      // that expanded to it
+
+      auto OptExpansion = AST.getTokens().expansionStartingAt(FileTok);
----------------
kadircet wrote:

i am a little worried about this, we definitely have some stability issues around token-buffers especially combined with certain optimizations we have in clangd (like preamble-patching).

Is it really worth reporting the precise token-range for macro-expanded includes? trade-off i have in mind is just reporting the first token of that expansion. downside is we aren't going to cover `MACRO_CALL(X)` but both the logic here and at runtime will likely be a lot more sturdy.

We usually hit crashes when things go wrong with tokenbuffers, IMO this is worse than having slightly off ranges. WDYT?

https://github.com/llvm/llvm-project/pull/137550