[cfe-dev] Adding more HTML-related facilities in Doxygen comment parsing

Mon Apr 28 09:55:29 PDT 2014

On 28/04/2014 17:05, Dmitri Gribenko wrote:
> On Mon, Apr 28, 2014 at 4:57 PM, David Chisnall
> <David.Chisnall at cl.cam.ac.uk> wrote:
>> On 28 Apr 2014, at 16:40, Dmitri Gribenko <gribozavr at gmail.com> wrote:
>>
>>> HTML is a part of Doxygen.  If we are not doing it, then we are
>>> implementing our own documentation language that no other person in
>>> the world cares about.  This is as if someone said, "I don't use
>>> partial specialization of templates in C++, so Clang should not be
>>> implementing it."
>> I think you are missing the point.  The Clang libraries parse C++ into an AST, which is a clang-specific data structure.  That's fine, because there aren't many other libraries that expose C++ AST data structures that users of clang want to interoperate with.  Clang then generates LLVM IR and object code from C++, using well-defined (or, in some cases, poorly defined, but at least vaguely standardised) ABIs.
>>
>> This is in direct contrast to a consumer of documentation, which may want to integrate with one of many different libraries that already provide complex data structures and APIs for handling rich text.
>>
>> Currently, libclang exposes the 'comment AST', which is an unwieldy thing that doesn't seem to address any needs.
> Not only.  It also exposes a cooked comment in XML format with a
> well-defined schema, that preserves the markup and semantic pieces of
> the AST.  You can XSLT that XML into HTML.

So it doesn't address any needs? Good we've cleared that up.

>
>> You also seem to be under the impression that doxygen is the only markup language that is found in [Objective-]C[++] source files.  For Objective-C, Apple's HeaderDoc and GSDoc are more popular, but there are half a dozen other less-popular one.
> HeaderDoc is sufficiently similar to Doxygen, and in fact, Clang's
> parser is forgiving enough to consume HeaderDoc as well.
>
>> I fully support interfaces in libclang that allow plugins for different comment markup languages, but deciding to hard-code one (and one that is poorly defined and apparently allows all of HTML 5) seems like a terrible idea.
> Doxygen is, more or less, an industry standard (one of, at least).  As
> soon as there is someone who is willing to implement a second comment
> markup language, I am willing to help with factoring.

Nobody is asking for a second comment markup language in clang. This is 
a thread about removing the first one, a change for which there's 
already broad consensus.

This is a very significant cleanup of clang's internals saving ~15,000 - 
20,000 LoC checked-in. The refactoring will have minimal impact on users 
and there's to be a callback to allow Doxygen-like tools to be 
implemented externally, plus a real simple doc comment checker in the 
parser. The biggest challenge is the libclang C interface but I'm 
confident we'll be able to work with that given the desire to sort out 
clang's comment system.

What this means in practice is that clang::comments::RawComment will 
remain, though it'll no longer need to allocate and store comment string 
duplicates ahead of time. The logic to attach comments to declarations 
will remain as-is, though it will be possible to override it. The rest 
will be split out.

I hope you change your mind about not helping with the factoring because 
it's a lot of work to take on. Perhaps we can preserve some of the 
monolithic comment AST support as an external plugin / library.

Alp.

-- 
http://www.nuanti.com
the browser experts