[PATCH] D41102: Setup clang-doc frontend framework

Sam McCall via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Wed Jan 31 01:09:58 PST 2018


sammccall added a comment.

Really sorry about the delay in getting to this.
At a high level, I'm most concerned about:

- the monolithic in-memory intermediate format, which seems to put hard limits on performance and scalability
- having high-level documentation and clear APIs
- having multiple intermediate formats



================
Comment at: test/Tooling/clang-doc-basic.cpp:3
+// RUN: mkdir %t
+// RUN: echo '[{"directory":"%t","command":"clang++ -c %t/test.cpp","file":"%t/test.cpp"}]' | sed -e 's/\\/\//g' > %t/compile_commands.json
+// RUN: cp "%s" "%t/test.cpp"
----------------
nit: you can also just write compile_flags.txt, which in this case would be empty


================
Comment at: test/Tooling/clang-doc-basic.cpp:22
+ // CHECK: ---
+ // CHECK: Qualified Name:  ''
+ // CHECK: Name:            ''
----------------
what is this? The TU? The global namespace?
What's the value in emitting it?


================
Comment at: tools/clang-doc/ClangDoc.h:1
+//===-- ClangDoc.cpp - ClangDoc ---------------------------------*- C++ -*-===//
+//
----------------
This needs some high-level documentation: what does the clang-doc library do, what's the main user (clang-doc command-line tool), what are the major moving parts.

I don't personally have a strong opinion on how this is split between this header / the implementation / a documentation page for the tool itself, but we'll probably need *something* for each of those.

(I think it's OK to defer the user-facing documentation to another patch, but we should do it before the tool becomes widely publicized or included in an llvm release)


================
Comment at: tools/clang-doc/ClangDoc.h:27
+
+// A Context which contains extra options which are used in ClangMoveTool.
+struct ClangDocContext {
----------------
what's clangmovetool?


================
Comment at: tools/clang-doc/ClangDoc.h:27
+
+// A Context which contains extra options which are used in ClangMoveTool.
+struct ClangDocContext {
----------------
sammccall wrote:
> what's clangmovetool?
nit: this sounds more like "options" than a context to me, though there's only one member to go on :-)


================
Comment at: tools/clang-doc/ClangDoc.h:29
+struct ClangDocContext {
+  // Which format in which to emit representation.
+  OutFormat EmitFormat;
----------------
Is this the intermediate representation referred to in the design doc, or the final output format?

If the former, why two formats rather than picking one?
YAML is nice for being usable by out-of-tree tools (though not as nice as JSON). But it seems like providing YAML as a trivial backend format would fit well?
Bitcode is presumably more space-efficient - if this is significant in practice it seems like a better choice.


================
Comment at: tools/clang-doc/ClangDoc.h:33
+
+class ClangDocVisitor : public RecursiveASTVisitor<ClangDocVisitor> {
+public:
----------------
This API makes essentially everything public. Is that the intent?

It seems like `ClangDocVisitor` is a detail, and the operation you want to expose is "extract doc from this AST into this reporter" or maybe "create an AST consumer that feeds this reporter".

It would be useful to have an API to extract documentation from individual AST nodes (e.g. a Decl). But I'd be nervous about trying to use the classes exposed here to do that. If it's efficiently possible, it'd be nice to expose a function.
(one use case for this is clangd)


================
Comment at: tools/clang-doc/ClangDoc.h:39
+
+  bool VisitTagDecl(const TagDecl *D);
+  bool VisitNamespaceDecl(const NamespaceDecl *D);
----------------
`override` where applicable


================
Comment at: tools/clang-doc/ClangDoc.h:80
+
+class ClangDocActionFactory : public tooling::FrontendActionFactory {
+public:
----------------
this class can definitely be hidden in the c++ file, behind a newClangDocActionFactory() func
(actually I think newFrontendActionFactory in Tooling.h could be extended to cover this, but not 100% sure)


================
Comment at: tools/clang-doc/ClangDocReporter.cpp:1
+//===-- ClangDocReporter.cpp - ClangDoc Reporter ----------------*- C++ -*-===//
+//
----------------
It looks like the plan for merging data across sources is to hold all information in one in-memory structure and incrementally add to it as you get information from TUs.
(This should be documented somewhere!)

This seems somewhat hostile to parallel processing: you're going to need to synchronize access to the structs owned by the ClangDocReporter if you want to gather from multiple TUs at once. Moreover, documenting large codebases using multiple machines in parallel seems very difficult.
And obviously it assumes you can fit the generated documentation for the codebase in memory, which would be nice to avoid.

Have you considered a mapreduce-like architecture, where the mapper gets AST callbacks and spits out data, and the reducer is responsible for assembling all the data together?

We don't have good framework support for mapreduce in open-source clang (we do have something internally at google, and I'd really like to better support this pattern in libtooling).
Still, you can see a simple example of this pattern (albeit with a trivial reducer) in clang/tools/extra/clangd/global-symbol-builder/GlobalSymbolBuilderMain.cpp.


================
Comment at: tools/clang-doc/ClangDocReporter.h:1
+//===-- Doc.cpp - ClangDoc --------------------------------------*- C++ -*-===//
+//
----------------
nit: header is out of sync with the filename

Each header should have a high level description of what this component is and how it fits into the system.


================
Comment at: tools/clang-doc/ClangDocReporter.h:33
+
+enum class OutFormat { YAML, LLVM };
+
----------------
document :-)
I'm not sure "LLVM" is a suitable name for the bitcode format.
It probably makes sense just to name this as being "clang-doc's binary format", maybe commenting that it's related to LLVM bitcode. It's really an implementation detail: these files won't be (I think?) interoperable with any other tools that process bitcode.


================
Comment at: tools/clang-doc/ClangDocReporter.h:124
+
+class ClangDocReporter : public ConstCommentVisitor<ClangDocReporter> {
+public:
----------------
What's the relationship between ClangDocReporter, the classes in ClangDoc.h and the intermediate format?

If the reporter is fixed-function and always writes the intermediate format, then it seems something of a confusing name - I'd have expected "Reporter" to be an abstract set of callbacks e.g. for producing different final output formats.
In that case, it seems like even injecting (into ClangDoc.h classes) the thing that builds the intermediate format is overkill - why not just inject a sink for the structs that get produced?


https://reviews.llvm.org/D41102





More information about the cfe-commits mailing list