[cfe-dev] RFC: clang-doc proposal

Tue Dec 5 01:48:52 PST 2017

On Mon, Dec 4, 2017 at 11:21 PM, Julie Hockett via cfe-dev
<cfe-dev at lists.llvm.org> wrote:
Hi.
I'm pretty much just a user, but i have some thoughts.

> This proposal is to build a new Clang tool for generating C/C++
> documentation, similar to other modern documentation generators such as
> rustdoc.  This tool would be a modular and extensible documentation
> generator for C/C++ code. It also would introduce a more elegant way of
> documenting C/C++ code, as well as maintaining backwards-compatibility with
> Doxygen-style markup to generate documentation.
>
> Today, Doxygen is a de-facto standard for generating C/C++ documentation.
> While widely used, the tool itself is a bit cumbersome, its output is both
> aesthetically and functionally lacking, and the non-permissive license
> combined with outdated codebase make any improvements difficult. This new
> tool would aim to simplify the overhead of generating documentation,
> integrating it into a Clang tool as well as allowing existing comments to
> continue to be used. It would also allow for relatively easy adaptation to
> new language features, as it would be built on the Clang parser and would
> use the Clang AST to generate documentation.
Sounds awesome so far.

> Proposed Tool
>
> The proposed tool would consist of two parts. First, it would have a
> frontend that consumes the Clang-generated AST and generates an intermediate
> representation of the code and documentation structure, including additional
> Markdown files. Second, it would have a set of backend modules that consume
> that representation and output documentation in a particular format (e.g.
> Markdown, HTML/website, etc.).
>
> The frontend would be a new tool that uses the Clang parser, which can
> already parse C/C++ documentation comments (using -Wdocumentation option).
> It can be easily used through the LibTooling interface, similarly to other
> Clang tools such as clang-check or clang-format. The initial steps in this
> project would be to build this tool using Clang's documentation parser. This
> tool would be able to attach comments to both functions, types, and macros
> and resolve declaration references, both of which will be useful in
> generating effective documentation. Since a good deal of existing C/C++ code
> uses the Doxygen documentation comment style, which is also supported by
> Clang's parser (and Doxygen itself can use Clang to parse these comments),
> this is the syntax we are going to support as well. In the future, we would
> also like to support Markdown-style comments, akin to Apple Swift Markup.
>
> For implementation, this tool will use the JSON Compilation Database format
> to integrate with existing build systems. It would also have subcommands to
> choose which parts of the code will be documented (e.g. all code, all public
> signatures, all comment-documented signatures). Once the code is processed,
> the tool will write out the internal representation of the the documentation
> in an intermediate representation, encapsulating the necessary information
> about the code, comments, and structure. This will allow backend tools to
> take the output and transform it as necessary.
>
> The backend modules would cover different possible outputs for the defined
> intermediate representation. Each module will consume the representation and
> output documentation in a specific format. Initially, we propose to focus on
> a module that generates Markdown files, in order to make the first version
> as simple as possible. Markdown files are automatically rendered on a number
> of sites and systems, as well as being clear and uncluttered in raw text
> form. It is also relatively easy to convert Markdown files into other
> formats, making it a good starting target. An additional module would target
> HTML/website output.
While i understand the reasoning, I'm not sure the backends is a great idea.
TLDW: how about *only* outputting RST (well, or MD) and delegating the
rest to the sphinx? This *should* allow for native integration into sphinx-based
documentation, which is currently not achievable natively with Doxygen.

> Intermediate Format
>
> The frontend would process the code and comments into an output, to be
> consumed by the backend. This representation would be internally represented
> as a set of classes and structs. Once the frontend has finished, it would
> write this representation to a file. While existing tools like Doxygen emit
> XML, XML is somewhat restrictive and bulky. Also, in order to fully use XML,
> the tool would need to define the representation twice (once for the
> internal classes/structs, once in the XML schema). So, we are instead
> considering two possible formats for this intermediate step: LLVM bitstream
> and JSON/YAML.
>
> LLVM bitstream format is space-efficient, and is natively written out by the
> Clang parser. It has the benefit of being similar to existing clang
> functionality, as the compiler frontend writes out its AST into the
> bitstream format to pass along to the LLVM backend. Using this format would
> allow the tool to emit the representation with minimal manipulation or
> additional parsing.
>
> Alternatively, JSON/YAML, while less space-efficient than bitstream, are
> human-readable and widely extensible. Neither has formal grammar or
> namespacing support, so if the tool needed rules of the sort it would need
> to define them itself on the frontend and require that the backend modules
> know them. While this would require a bit more parsing to emit on the
> frontend and load on the backend, the representation would be able to stand
> separately from the tool, and the backend modules would not necessarily need
> an understanding of the LLVM bitstream to load it.

I'm not seeing any mention of graph/diagram generation.

> Extensions
>
> In addition to generating documentation from comments, a future extension
> would be to automatically generate and insert boilerplate comments into the
> code on demand. As the tool would have access to the AST, it could insert
> comments into the code similar to how tools like clang-tidy and clang-format
> adjust the code. Such generated comments would follow the documentation
> style for comments, and so would generate basic, if not wholly described,
> documentation, including information about parameters, return types, class
> members, etc. For example, the following would be generated for the below
> function:
>
>
> /// Do Things
>
> ///
>
> /// TODO: Write detailed description
>
> ///
>
> /// \param value
>
> /// \return int
>
> int doThings(int value) { return value; }
>
>
> In addition, the parsing tool could also be expanded to also parse
> Markdown-style comments, using the Apple Swift Markup style as a reference.
>
>
> Please let us know if you have comments or concerns about this proposal.
>
> Thanks!
>
> Julie
Roman

> _______________________________________________
> cfe-dev mailing list
> cfe-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>