[cfe-dev] [GSoC] Doxygen documentation with clang

Mon Mar 10 10:59:44 PDT 2014

Dmitri Gribenko <gribozavr at gmail.com> writes:

> On Wed, Mar 5, 2014 at 1:19 PM, Philipp Moeller <bootsarehax at gmail.com> wrote:
>> Hello clang-devel,
>>
>> one of the project ideas for GSoC 2014 is a clang-based tool to
>> generate documentation using doxygen-style comments in the source
>> code.  I wanted to gauge the interest into such a project, see if
>> someone is willing to mentor it, and provide a rough outline of what
>> my idea of the project is. Any feedback on this is very welcome.
>
> Hi Philipp,
>
> Please excuse me for the late reply.
>
> I am very interested in this project and I would be happy to mentor
> it.

Great to hear that. I have done a GSoC already and there are a few
things I thought worked really well with my last mentor. We probably
should go over them separately and see if we have the same idea of how
all this should work.

I'm also not familiar with llvm project politics and there is the
question how many slots llvm will get and how much promotion and
"lobbying" is necessary to make this thing happen. I was thinking about
cross-posting my mail to clang-dev to llvm-dev as well and see how much
of a response there is.

For a good application I would like to define a certain set of
milestones we want to achieve. If you have anything specific in mind,
please let me know.

> Comments inline.
>
>> 2 Prior Work
>> ════════════
>>
>> • clang already understands doxygen-style comments to a degree and
>>   attaches them to the ast:
>>   [http://llvm.org/devmtg/2012-11/Gribenko_CommentParsing.pdf]
>> • doxygen can already use clang as a backend
>>   [http://comments.gmane.org/gmane.comp.compilers.clang.devel/29490]
>> • there already is a cldoc [https://github.com/jessevdk/cldoc]
>>
>>
>> 3 Project Plan
>> ══════════════
>>
>> 3.1 Fully parse doxygen comments
>> ────────────────────────────────
>>
>> Doxygen supports markdown, HTML entities, if/endif, post-definition
>> documentation, file scope doc, function groups, member groups, pages,
>> page hierarchies, examples, links, auto-links, and todo/bug lists (the
>> dreaded xrefitem).
>>
>> Some of those features might seem like overkill but they usually ended
>> up in doxygen because someone wanted them and they are actively used
>> in "the real world" (c).
>>
>> The CommentParser should do its best to represent those in a useful
>> fashion in the CommentAST (especially link resolving) so tools further
>> down the chain can focus on their tasks only.
>
> Link resolving is pretty important when generating self-contained
> documentation files.  Clang does not attempt to resolve links right
> now.  I expect that implementing this will need a significant time
> investment.

Yes, the way doxygen approaches linking is far from trivial
(auto-linking, namespace guessing, sometimes it considers scope) and
I would allocate a decent amount of time for this.

There also would need to be some way to defer linking when external
projects are involved (possibly marking certain chunks of a comment as
linkable, but unresolved).

>
>
> Another important missing feature is attaching comments to macros.
> Currently Clang can only attach comments to declarations.

Is this due to a limitation of the AST or just a feature that you
skipped in your first batch of -Wdocumentation? I always assumed it
shouldn't be too hard, but then that's probably just me being naive.

>
>
>> 3.2 To represent intermediately or not
>> ──────────────────────────────────────
>>
>> The actual documentation generation tool has two options:
>> • use libclang, work on the AST directly and spit out documenation.
>> • let clang produce some intermediate representation (XML?) and work
>>   on this
>>
>> The first option seems to be the easy road but would tie the
>> generation directly into clang. It also seems harder to extend and
>> reuse.
>>
>> The second option is probably the most general approach. Generating XML
>> to represent the AST is actually proposed as its own GSoC project. Maybe
>> it would be possible to produce a reduced XML only containing
>> declarations and comments that could later be extended to feature the
>> full AST. Designing this schema is probably non-trivial and should be
>> well thought through.
>>
>> The slides on -Wdocumentation already mention the ability to produce
>> XML but I couldn't figure out yet how to get that to work. From
>> glancing at the schema in bindings/xml/comment-xml-schema.rng it looks
>> pretty useful already, but some features (header dependencies,
>> inheritance relationships) are AFAIK missing.
>>
>> The main benefit of an intermediate representation would be to enable
>> us to build something akin to doxygen's "external projects" feature,
>> which is incredibly useful (not having it would be a deal-breaker for
>> some of my own projects).
>
> Clang's XML comment representation should accurately represent the
> comment in a way that is:
>
> - extensible and future-proof,
> - allows us to change the AST while maintaining backward
> compatibility.

OK, I didn't feel qualified to make those statements. Knowing you think
the XML is suitable makes it a very viable candidate.

>
> Information like inheritance relationships is omitted on purpose,
> because it is expected that the client is using libclang and can query
> the additional information as needed.

This would make it necessary that the XML carries the information to
build all the translation units that were used to generate the XML in
the first place, correct? Or is libclang able to deserialize the XML to
reconstruct the AST?

If the XML would already provide this information we would get loser
coupling, but it wouldn't be as flexible and every time new, helpful
information is discovered the XML would need to evolve.

I prefer your approach.

>
>> 3.3.3 Database + Web-server
>> ╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌
>>
>> A special case for HTML. Provide a database and a web-frontend that
>> can be hosted. Seems interesting for fast search functions and live
>> documentation updates. clang-server where are you?
>
> This looks like a very promising approach that does not just provide
> the same functionality as Doxygen does, but introduces new value.
> This can actually become the foundation for the clang-server itself!
> The basic functionality for live updates -- tracking dependencies
> between source files, indexing and reindexing will be useful for both
> documentation server and clang-server.

The main question here seems how to represent the persistent AST:

- relational DB
- NoSQL
- graph DB

all seem like they could work and I don't have a clear idea how either
of them is going to perfom.

Updating the mapping can probably be done with different granularity
giving better performance on lower granularity but being harder to
implement.

IIRC there used to be design documents for clang-server somewhere on the
web. I'll look for them to get a clearer picture of the
requirements. 

I'm absolutely not averse to working on this, but maybe we should focus
on first improving compatibility with Doxygen and comment parsing and
move into this topic latter. There seems plenty of work in the first
stages already.

Cheers,
Philipp