[cfe-dev] [GSoC] Doxygen documentation with clang

Philipp Moeller bootsarehax at gmail.com
Tue Mar 11 05:36:46 PDT 2014


Dmitri Gribenko <gribozavr at gmail.com>
writes:

> On Mon, Mar 10, 2014 at 5:59 PM, Philipp Moeller <bootsarehax at gmail.com> wrote:
>> Dmitri Gribenko <gribozavr at gmail.com> writes:
>>
>>> On Wed, Mar 5, 2014 at 1:19 PM, Philipp Moeller <bootsarehax at gmail.com> wrote:
>>>> Hello clang-devel,
>>>>
>>>> one of the project ideas for GSoC 2014 is a clang-based tool to
>>>> generate documentation using doxygen-style comments in the source
>>>> code.  I wanted to gauge the interest into such a project, see if
>>>> someone is willing to mentor it, and provide a rough outline of what
>>>> my idea of the project is. Any feedback on this is very welcome.
>>>
>>> Hi Philipp,
>>>
>>> Please excuse me for the late reply.
>>>
>>> I am very interested in this project and I would be happy to mentor
>>> it.
>>
>> Great to hear that. I have done a GSoC already and there are a few
>> things I thought worked really well with my last mentor. We probably
>> should go over them separately and see if we have the same idea of how
>> all this should work.
>>
>> I'm also not familiar with llvm project politics and there is the
>> question how many slots llvm will get and how much promotion and
>> "lobbying" is necessary to make this thing happen. I was thinking about
>> cross-posting my mail to clang-dev to llvm-dev as well and see how much
>> of a response there is.
>
> I don't think there is much politics involved.

That's great. I'll submit a proposal based on this mail exchange to
Melange today and will keep improving it as we go further.

>
>
>> For a good application I would like to define a certain set of
>> milestones we want to achieve. If you have anything specific in mind,
>> please let me know.
>
> Based on the discussion so far, I think this could be used as a draft plan:
>
> - attaching comments to macros;
> - parsing the reference syntax (recognising that the text from here to
> there is a possible reference, which we will need to resolve).
> Implementing Comment AST representation for unresolved references.
> Designing and implementing the XML representation for unresolved
> references.
> - resolving links to decls within the TU.  The result should probably
> be a Decl* or a USR.  The USR should be available in the XML;
> - defining a schema for a DB to store information about possible link
> targets (declarations and macros);
> - populating DB with information from TUs in the project;
> - resolving links to decls cross-TU using the DB.  The result should
> be a USR, and maybe the source file name + source location.
>
> Does this sound reasonable?  What do you think?

The first three stages seem very self-contained and we can probably add
them independently.

Designing the database for link resolution seems to be the biggest
challenge especially since it should be future-proof as you mention in
your last comment. I'll be working on designing some preliminary schema
so there is something more substantial to discuss.

> This already looks like a lot of work, so I am not sure if actually
> writing a tool that is going to produce HTML or LaTeX is going to fit
> in...  Maybe only a skeleton of such a tool.

I agree. The proposal I'll upload talks about a very basic HTML and
possibly a LaTeX generator to outline how the functionality can be used
to build a more general purpose tool.

>>>> 2 Prior Work
>>>> ════════════
>>>>
>>>> • clang already understands doxygen-style comments to a degree and
>>>>   attaches them to the ast:
>>>>   [http://llvm.org/devmtg/2012-11/Gribenko_CommentParsing.pdf]
>>>> • doxygen can already use clang as a backend
>>>>   [http://comments.gmane.org/gmane.comp.compilers.clang.devel/29490]
>>>> • there already is a cldoc [https://github.com/jessevdk/cldoc]
>>>>
>>>>
>>>> 3 Project Plan
>>>> ══════════════
>>>>
>>>> 3.1 Fully parse doxygen comments
>>>> ────────────────────────────────
>>>>
>>>> Doxygen supports markdown, HTML entities, if/endif, post-definition
>>>> documentation, file scope doc, function groups, member groups, pages,
>>>> page hierarchies, examples, links, auto-links, and todo/bug lists (the
>>>> dreaded xrefitem).
>>>>
>>>> Some of those features might seem like overkill but they usually ended
>>>> up in doxygen because someone wanted them and they are actively used
>>>> in "the real world" (c).
>>>>
>>>> The CommentParser should do its best to represent those in a useful
>>>> fashion in the CommentAST (especially link resolving) so tools further
>>>> down the chain can focus on their tasks only.
>>>
>>> Link resolving is pretty important when generating self-contained
>>> documentation files.  Clang does not attempt to resolve links right
>>> now.  I expect that implementing this will need a significant time
>>> investment.
>>
>> Yes, the way doxygen approaches linking is far from trivial
>> (auto-linking, namespace guessing, sometimes it considers scope) and
>> I would allocate a decent amount of time for this.
>>
>> There also would need to be some way to defer linking when external
>> projects are involved (possibly marking certain chunks of a comment as
>> linkable, but unresolved).
>>
>>>
>>>
>>> Another important missing feature is attaching comments to macros.
>>> Currently Clang can only attach comments to declarations.
>>
>> Is this due to a limitation of the AST or just a feature that you
>> skipped in your first batch of -Wdocumentation? I always assumed it
>> shouldn't be too hard, but then that's probably just me being naive.
>
> It just involves a completely different code path, through
> Preprocessor.  I don't expect implementing it to be too hard, but
> probably not trivial either, and probably involving a lot of plumbing
> though everywhere.

Sounds like a perfect first task to tackle.

>>> Information like inheritance relationships is omitted on purpose,
>>> because it is expected that the client is using libclang and can query
>>> the additional information as needed.
>>
>> This would make it necessary that the XML carries the information to
>> build all the translation units that were used to generate the XML in
>> the first place, correct? Or is libclang able to deserialize the XML to
>> reconstruct the AST?
>>
>> If the XML would already provide this information we would get loser
>> coupling, but it wouldn't be as flexible and every time new, helpful
>> information is discovered the XML would need to evolve.
>
> Sorry, I did not explain clearly.  Just to clear any possible misunderstandings:
> - XML format is only for comments, not C, C++, Objective-C ASTs.
> - XML format is not reversible to comment ASTs.
>
> Currently clients already have a TranslationUnit when they query it
> for the XML representation of the comment.  XML is optimised for the
> IDE usecase, where the XML will be rendered into some rich text view
> in the IDE.  If the client needs need extra information, it can query
> it with very little overhead, because the TranslationUnit is already
> in memory, and all the parsing and semantic analysis work was done.
>
> OTOH, if we will decide on more offline approach, where comments in
> XML format are stored after the TranslationUnit is destroyed, then we
> either need to store more indexing info out-of-band, or add optional
> pieces to the XML with that information.

Thanks, I was under the impression that the XML should represent at
least a subset of the AST and that the XML should be the sole input to a
documentation generator. This would obviously require it to contain much
more information.

>>>> 3.3.3 Database + Web-server
>>>> ╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌
>>>>
>>>> A special case for HTML. Provide a database and a web-frontend that
>>>> can be hosted. Seems interesting for fast search functions and live
>>>> documentation updates. clang-server where are you?
>>>
>>> This looks like a very promising approach that does not just provide
>>> the same functionality as Doxygen does, but introduces new value.
>>> This can actually become the foundation for the clang-server itself!
>>> The basic functionality for live updates -- tracking dependencies
>>> between source files, indexing and reindexing will be useful for both
>>> documentation server and clang-server.
>>
>> The main question here seems how to represent the persistent AST:
>>
>> - relational DB
>> - NoSQL
>> - graph DB
>>
>> all seem like they could work and I don't have a clear idea how either
>> of them is going to perform.
>
> Do you have any previous experience with databases, or a particular
> preference?  I guess that if we use a portable subset of sqlite, then
> the tool would be able to run on a wide variety of systems, make it
> extremely easy to set up the tool, and leave a possibility of using a
> more heavyweight database in future if needed.

Most of my database work has been with sqlite and it seems the most
portable of all the options and is also the least hassle for users.

I'll allocate some time in the schema design phase of the database to
research some alternatives more closely.

>> Updating the mapping can probably be done with different granularity
>> giving better performance on lower granularity but being harder to
>> implement.
>>
>> IIRC there used to be design documents for clang-server somewhere on the
>> web. I'll look for them to get a clearer picture of the
>> requirements.
>
> Feel free to ask me questions about this.
>
>> I'm absolutely not averse to working on this, but maybe we should focus
>> on first improving compatibility with Doxygen and comment parsing and
>> move into this topic latter. There seems plenty of work in the first
>> stages already.
>
> I completely agree, but resolving links cross-TU will require doing
> some indexing of the source files.  Certainly, in a Doxygen-related
> GSoC we are not going to do any incremental indexing, and we are not
> going to record more information than needed for this application, but
> we should design the DB schema in a way that allows us to do this in
> future.




More information about the cfe-dev mailing list