[cfe-dev] [GSoC] Doxygen documentation with clang

Mon Mar 10 15:54:54 PDT 2014

Dear Dmitri, Philipp,

Regarding your (very interesting) project proposal, I have also been working on documentation using libclang. I have followed an approach that goes beyond Doxygen's capabilities.
I'd like to share with you my results so far. 
http://jlopezvi.github.io/Flowgen/

Looking forward to your thoughts.
Best regards,

Juan Lopez-Villarejo

________________________________________
From: cfe-dev-bounces at cs.uiuc.edu [cfe-dev-bounces at cs.uiuc.edu] on behalf of Dmitri Gribenko [gribozavr at gmail.com]
Sent: 10 March 2014 20:43
To: Philipp Moeller
Cc: Clang Developers
Subject: Re: [cfe-dev] [GSoC] Doxygen documentation with clang

On Mon, Mar 10, 2014 at 5:59 PM, Philipp Moeller <bootsarehax at gmail.com> wrote:
> Dmitri Gribenko <gribozavr at gmail.com> writes:
>
>> On Wed, Mar 5, 2014 at 1:19 PM, Philipp Moeller <bootsarehax at gmail.com> wrote:
>>> Hello clang-devel,
>>>
>>> one of the project ideas for GSoC 2014 is a clang-based tool to
>>> generate documentation using doxygen-style comments in the source
>>> code.  I wanted to gauge the interest into such a project, see if
>>> someone is willing to mentor it, and provide a rough outline of what
>>> my idea of the project is. Any feedback on this is very welcome.
>>
>> Hi Philipp,
>>
>> Please excuse me for the late reply.
>>
>> I am very interested in this project and I would be happy to mentor
>> it.
>
> Great to hear that. I have done a GSoC already and there are a few
> things I thought worked really well with my last mentor. We probably
> should go over them separately and see if we have the same idea of how
> all this should work.
>
> I'm also not familiar with llvm project politics and there is the
> question how many slots llvm will get and how much promotion and
> "lobbying" is necessary to make this thing happen. I was thinking about
> cross-posting my mail to clang-dev to llvm-dev as well and see how much
> of a response there is.

I don't think there is much politics involved.

> For a good application I would like to define a certain set of
> milestones we want to achieve. If you have anything specific in mind,
> please let me know.

Based on the discussion so far, I think this could be used as a draft plan:

- attaching comments to macros;
- parsing the reference syntax (recognising that the text from here to
there is a possible reference, which we will need to resolve).
Implementing Comment AST representation for unresolved references.
Designing and implementing the XML representation for unresolved
references.
- resolving links to decls within the TU.  The result should probably
be a Decl* or a USR.  The USR should be available in the XML;
- defining a schema for a DB to store information about possible link
targets (declarations and macros);
- populating DB with information from TUs in the project;
- resolving links to decls cross-TU using the DB.  The result should
be a USR, and maybe the source file name + source location.

Does this sound reasonable?  What do you think?

This already looks like a lot of work, so I am not sure if actually
writing a tool that is going to produce HTML or LaTeX is going to fit
in...  Maybe only a skeleton of such a tool.

>>> 2 Prior Work
>>> ════════════
>>>
>>> • clang already understands doxygen-style comments to a degree and
>>>   attaches them to the ast:
>>>   [http://llvm.org/devmtg/2012-11/Gribenko_CommentParsing.pdf]
>>> • doxygen can already use clang as a backend
>>>   [http://comments.gmane.org/gmane.comp.compilers.clang.devel/29490]
>>> • there already is a cldoc [https://github.com/jessevdk/cldoc]
>>>
>>>
>>> 3 Project Plan
>>> ══════════════
>>>
>>> 3.1 Fully parse doxygen comments
>>> ────────────────────────────────
>>>
>>> Doxygen supports markdown, HTML entities, if/endif, post-definition
>>> documentation, file scope doc, function groups, member groups, pages,
>>> page hierarchies, examples, links, auto-links, and todo/bug lists (the
>>> dreaded xrefitem).
>>>
>>> Some of those features might seem like overkill but they usually ended
>>> up in doxygen because someone wanted them and they are actively used
>>> in "the real world" (c).
>>>
>>> The CommentParser should do its best to represent those in a useful
>>> fashion in the CommentAST (especially link resolving) so tools further
>>> down the chain can focus on their tasks only.
>>
>> Link resolving is pretty important when generating self-contained
>> documentation files.  Clang does not attempt to resolve links right
>> now.  I expect that implementing this will need a significant time
>> investment.
>
> Yes, the way doxygen approaches linking is far from trivial
> (auto-linking, namespace guessing, sometimes it considers scope) and
> I would allocate a decent amount of time for this.
>
> There also would need to be some way to defer linking when external
> projects are involved (possibly marking certain chunks of a comment as
> linkable, but unresolved).
>
>>
>>
>> Another important missing feature is attaching comments to macros.
>> Currently Clang can only attach comments to declarations.
>
> Is this due to a limitation of the AST or just a feature that you
> skipped in your first batch of -Wdocumentation? I always assumed it
> shouldn't be too hard, but then that's probably just me being naive.

It just involves a completely different code path, through
Preprocessor.  I don't expect implementing it to be too hard, but
probably not trivial either, and probably involving a lot of plumbing
though everywhere.

>> Information like inheritance relationships is omitted on purpose,
>> because it is expected that the client is using libclang and can query
>> the additional information as needed.
>
> This would make it necessary that the XML carries the information to
> build all the translation units that were used to generate the XML in
> the first place, correct? Or is libclang able to deserialize the XML to
> reconstruct the AST?
>
> If the XML would already provide this information we would get loser
> coupling, but it wouldn't be as flexible and every time new, helpful
> information is discovered the XML would need to evolve.

Sorry, I did not explain clearly.  Just to clear any possible misunderstandings:
- XML format is only for comments, not C, C++, Objective-C ASTs.
- XML format is not reversible to comment ASTs.

Currently clients already have a TranslationUnit when they query it
for the XML representation of the comment.  XML is optimised for the
IDE usecase, where the XML will be rendered into some rich text view
in the IDE.  If the client needs need extra information, it can query
it with very little overhead, because the TranslationUnit is already
in memory, and all the parsing and semantic analysis work was done.

OTOH, if we will decide on more offline approach, where comments in
XML format are stored after the TranslationUnit is destroyed, then we
either need to store more indexing info out-of-band, or add optional
pieces to the XML with that information.

>>> 3.3.3 Database + Web-server
>>> ╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌
>>>
>>> A special case for HTML. Provide a database and a web-frontend that
>>> can be hosted. Seems interesting for fast search functions and live
>>> documentation updates. clang-server where are you?
>>
>> This looks like a very promising approach that does not just provide
>> the same functionality as Doxygen does, but introduces new value.
>> This can actually become the foundation for the clang-server itself!
>> The basic functionality for live updates -- tracking dependencies
>> between source files, indexing and reindexing will be useful for both
>> documentation server and clang-server.
>
> The main question here seems how to represent the persistent AST:
>
> - relational DB
> - NoSQL
> - graph DB
>
> all seem like they could work and I don't have a clear idea how either
> of them is going to perform.

Do you have any previous experience with databases, or a particular
preference?  I guess that if we use a portable subset of sqlite, then
the tool would be able to run on a wide variety of systems, make it
extremely easy to set up the tool, and leave a possibility of using a
more heavyweight database in future if needed.

> Updating the mapping can probably be done with different granularity
> giving better performance on lower granularity but being harder to
> implement.
>
> IIRC there used to be design documents for clang-server somewhere on the
> web. I'll look for them to get a clearer picture of the
> requirements.

Feel free to ask me questions about this.

> I'm absolutely not averse to working on this, but maybe we should focus
> on first improving compatibility with Doxygen and comment parsing and
> move into this topic latter. There seems plenty of work in the first
> stages already.

I completely agree, but resolving links cross-TU will require doing
some indexing of the source files.  Certainly, in a Doxygen-related
GSoC we are not going to do any incremental indexing, and we are not
going to record more information than needed for this application, but
we should design the DB schema in a way that allows us to do this in
future.

Dmitri

--
main(i,j){for(i=2;;i++){for(j=2;j<i;j++){if(!(i%j)){j=0;break;}}if
(j){printf("%d\n",i);}}} /*Dmitri Gribenko <gribozavr at gmail.com>*/

_______________________________________________
cfe-dev mailing list
cfe-dev at cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev