[cfe-dev] Adding more HTML-related facilities in Doxygen comment parsing

Aaron Ballman aaron at aaronballman.com
Mon Apr 28 08:40:02 PDT 2014


On Mon, Apr 28, 2014 at 11:23 AM, Dmitri Gribenko <gribozavr at gmail.com> wrote:
> On Mon, Apr 28, 2014 at 4:11 PM, Aaron Ballman <aaron at aaronballman.com> wrote:
>> On Mon, Apr 28, 2014 at 10:14 AM, Dmitri Gribenko <gribozavr at gmail.com> wrote:
>>> Parsing Doxygen is inherently intertwined with HTML parsing and
>>> semantic analysis.  Doing filtering at the same level does not look
>>> out of scope and mislayered.
>>
>> There are no 3rd party libraries or tools which already do this which
>> we could then rely on? If not, where do you see this code living
>> within the overall structure of the compiler? Will it continue to be a
>> part of clangAST like the other comment-related code?
>
> Hi Aaron,
>
> As I explained in the first message in this thread, libtidy would
> technically work, except: (1) it was never updated for HTML5, and does
> not have formal releases, it is probably also unmaintained, and (2)
> constructing an HTML DOM just to check the tag name is a superfluous
> exercise in using a library just for the sake of using a library and
> it will not deliver good performance either.  Apart from libtidy, I am
> not aware of other libraries with suitable functionality and licence.

Fair (I missed that entire paragraph originally... sorry!).

> The HTML tables and helpers can be factored out somewhere into
> clangBasic, clangHTML or even llvmSupport or llvmHTML -- this is a
> bikesched that I mostly don't care about.

I don't believe this to be a bikeshed at all; it's actually my primary
concern at this point.

>  But Doxygen-specific
> semantic analysis of HTML, as illustrated in the first post, has to
> live in the same library as comment parsing.
>
> About comment parsing living in libAST -- it is possible to move it to
> a separate library, but we would have to bounce off an abstract base
> class to untie the circular dependency between ASTContext and that
> libClangComment.  Because currently comment parsing is living in
> completely separate files in libAST, which are clearly named as such,
> I don't think that comment parsing is a burden for Clang developers
> working on libAST.

I am mostly concerned about factoring this out such that it can be
disabled via build flags when building Clang. I've not seen much
information about just how far down the rabbit hole this sanitization
will go (for instance, if there's a DTD, will it be followed?), so I
am worried about security implications from this. I also agree with
Alp that this feels like a scope issue -- why is a C-family compiler
getting HTML validation + sanitization as part of its core components
(increasing maintenance burden, review burdens, etc)?

FWIW, if there was a way to turn Doxygen support of this nature into a
plugin, my complaints would vanish.

~Aaron



More information about the cfe-dev mailing list