[PATCH] D41102: Setup clang-doc frontend framework

Tue Feb 6 16:18:54 PST 2018

juliehockett added inline comments.

================
Comment at: tools/clang-doc/ClangDoc.h:29
+struct ClangDocContext {
+  // Which format in which to emit representation.
+  OutFormat EmitFormat;
----------------
sammccall wrote:
> juliehockett wrote:
> > sammccall wrote:
> > > Is this the intermediate representation referred to in the design doc, or the final output format?
> > > 
> > > If the former, why two formats rather than picking one?
> > > YAML is nice for being usable by out-of-tree tools (though not as nice as JSON). But it seems like providing YAML as a trivial backend format would fit well?
> > > Bitcode is presumably more space-efficient - if this is significant in practice it seems like a better choice.
> > That's the idea -- for developing purposes, I wrote up the YAML output first for this patch, and there will be a follow-on patch expanding the bitcode/binary output. I've updated the flags to default to the binary, with an option to dump the yaml (rather than the other way around).
> What's still not clear to me is: is YAML a) a "real" intermediate format, or b) just a debug representation?
> 
> I would suggest for orthogonality that there only be one intermediate format, and that any debug version be generated from it. In practice I guess this means:
>  - the reporter builds the in-memory representation
>  - you can serialize/deserialize memory representation to the IR (bitcode)
>  - you can serialize memory representation to debug representation (YAML) but not parse
>  - maybe the clang-doc core should *only* know about IR, and YAML should be produced in the same way e.g. HTML would be?
> 
> This does pose a short-term problem: the canonical IR is bitcode, we need YAML for the lit tests, and we don't have the decoder/transformer part yet. This could be solved either by using YAML as the IR *for now* and switching later, or by adding a simple decoder now.
> Either way it points to the *reporter* not having an output format option, and having to support two formats.
> 
> WDYT? I might be missing something here.
The mapper now only has the ability to write to bitcode -- I'm working on writing up a simple decoder to use for testing and will update the patch again once that's working. Once that's in place, that will also serve the purpose of being the foundation for how we're going to read the bitcode into the backend to produce actual docs. Does that make sense?

================
Comment at: tools/clang-doc/ClangDoc.h:33
+
+class ClangDocVisitor : public RecursiveASTVisitor<ClangDocVisitor> {
+public:
----------------
sammccall wrote:
> juliehockett wrote:
> > jakehehrlich wrote:
> > > sammccall wrote:
> > > > This API makes essentially everything public. Is that the intent?
> > > > 
> > > > It seems like `ClangDocVisitor` is a detail, and the operation you want to expose is "extract doc from this AST into this reporter" or maybe "create an AST consumer that feeds this reporter".
> > > > 
> > > > It would be useful to have an API to extract documentation from individual AST nodes (e.g. a Decl). But I'd be nervous about trying to use the classes exposed here to do that. If it's efficiently possible, it'd be nice to expose a function.
> > > > (one use case for this is clangd)
> > > Correct me if I'm wrong but I believe that everything needs to be public in this case because the base class needs to be able to call them. So the visit methods all need to be public.
> > Yes to the `Visit*Decl` methods being public because of the base class.
> > 
> > That said, I shifted a few things around here and implemented it as a `MatcherFinder` instead of a `RecursiveASTVisitor`. The change will allow us to make most of the methods private, and have the ability to fairly easily implement an API for pulling a specific node (e.g. by name or by decl type). As far as I understand (and please correct me if I'm wrong), the matcher traverses the tree in a similar way. This will also make mapping through individual nodes easier.
> Sorry for being vague - yes overridden or "CRTP-overridden" methods may need to be public.
> I meant that the classes themselves don't need to be exposed, I think. (The header could just expose a function to create the needed ones, that returns `unique_ptr<interface>`
> 
> There are now fewer classes exposed here, but I think most/all of them can still reasonably be hidden.
So I've restructured this again and collapsed all of the tooling things into to ExecutionContext. The only thing exposed here now is the callback, which is registered on the matcher. Is there anything else I'm missing?

================
Comment at: tools/clang-doc/ClangDocReporter.h:87
   std::string MangledName;
   std::string DefinitionFile;
   std::string ReturnType;
----------------
Athosvk wrote:
> Seems common to almost all Info structs, so you can probably move it to the/some base.
> 
> Namespaces do seem unrelated, so maybe you can make another struct inbetween? E.g. something like struct SymbolInfo : Info which contains a field for DefinitionFile and Locations (since that may not be used for namespaces either).
> 
> Additionally, what will you do when you merge this output information from multiple compilation untis back together? Only one should have the DefinitionFile set as the other compilation units won't see the definition. What happens if a function stays undefined? Can you generate documentation for it?
Some of this is addressed in the mapper refactoring, but the thought about separating out the namespace with another layer is a good one. Let me know what you think of the update!

================
Comment at: tools/clang-doc/ClangDocReporter.h:89
   std::string ReturnType;
   llvm::SmallVector<NamedType, 4> Params;
   AccessSpecifier Access;
----------------
Athosvk wrote:
> Perhaps you could already attach the parameter comments to this? So something like a 
> 
> ```
> struct ParamInfo
> {
>     NamedType Type;
>     std::string/CommentInfo Description/CommentInfo;
> }
> ```
> Or are you planning to keep this to a later stage? At this point it seems like the backend will have to parse a CommentInfo struct to attach comments to parameters etc. manually
That's a good point--is makes sense to attach a comment to any named type, since class members and whatnot can also have them. That said, it can't really be done until the declaration with the documentation comment is seen, so with the new MR framework this might be a task to push off to the reducer (which will be the next patch).

https://reviews.llvm.org/D41102