[PATCH] D143112: [clang] Support parsing comments without ASTContext

Mon Feb 13 06:06:55 PST 2023

kadircet added a comment.

Thanks, this looks like a good start. I left some comments about the details of implementation, as for placing and APIs in general, i think it's useful to see how things will be built on top inside clangd to make the right call here.

================
Comment at: clang/include/clang/AST/RawCommentList.h:159
+  /// Parse the \p Comment, storing the result in \p Allocator
+  static comments::FullComment *parse(llvm::StringRef Comment,
+                                      llvm::BumpPtrAllocator &Allocator,
----------------
tom-anders wrote:
> Not sure if this is the right place for this? Could be moved to `FullComment` or made a free function instead?
i think this would make more sense inside clangd as a free function when we're processing comments. it's nice to get to re-use some Lexer/Sema/Parser creation logic, but i don't think it's big enough to justify a new public interfaces that's solely used by clangd.
if we can figure out more patterns that we can migrate (the unittest in this patch is a good example, but still just a unittest and doesn't benefit much from code reuse) we can consider lifting this logic up to a common place. the other users should hopefully make it easier to decide where this code should live.

================
Comment at: clang/lib/AST/RawCommentList.cpp:227
+                      SourceMgr.getLocForStartOfFile(SourceMgr.getMainFileID()),
+                      Allocator, SourceMgr.getDiagnostics(), Traits);
+}
----------------
tom-anders wrote:
> I wasn't sure if it's worth making the `Diagnostics` parameter optional here - Our `SourceMgr` already has it anyway, does it impact performance significantly to use it, even though we don't need it (yet)?
as things stand, Lexer, Sema and Parser expect a non-null diagnosticsengine so we have to pass something (i guess that's what you meant by making it optional).

the performance implications comes from two places:
1- extra analysis being performed in the face of malformed comments to produce "useful" diagnostics
2- creation cost of diagnostic itself, as all the data about ranges, decls etc. are copied around when constructing a diagnostic.

i don't know how costly the first action is today, it's unfortunately the case that would require a lot of code changes, if it's happening quite often in practice (especially when we're in codebases without proper doxygen comments)
the latter is somewhat easier to address by providing our own "mock" diagnostics engine, that'll just drop all of these extra diagnostic data on the floor instead of copying around.

i think taking a look at the codebase for the first (searching for `Diag(...) <<` patterns in Comment{Lexer,Sema,Parser}.cpp should give some ideas) and assessing whether we're performing any extra complicated analysis would be a good start. i would expect us not to, and if that's the case i think we can leave it at that and don't try to make code changes.
the latter we can address easily and we should.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D143112/new/

https://reviews.llvm.org/D143112