[cfe-dev] Adding more HTML-related facilities in Doxygen comment parsing

Mon Apr 28 08:11:34 PDT 2014

On Mon, Apr 28, 2014 at 10:14 AM, Dmitri Gribenko <gribozavr at gmail.com> wrote:
> On Mon, Apr 28, 2014 at 2:56 PM, Alp Toker <alp at nuanti.com> wrote:
>>
>> On 28/04/2014 13:45, Dmitri Gribenko wrote:
>>>
>>> On Mon, Apr 28, 2014 at 1:08 PM, Tobias Grosser <tobias at grosser.es> wrote:
>>>>
>>>> Out of interest. What is required to sanitize HTML?
>>>
>>> There are two different levels of sanitizing:
>>> - well-formedness of HTML,
>>> - absence of javascript.
>>>
>>> The former is harder to guarantee than the latter, but it is important
>>> nevertheless, because being able to directly pass through HTML from
>>> Clang's output into a webpage template and get back a document that
>>> passes validation is a useful property.
>>
>>
>> Dmitri, this may be an interesting problem to solve but it doesn't make
>> sense to build it into libclang.
>>
>> LLVM has no procedure for 0-day vulnerabilities, contacting vendors and
>> pushing updates working with the web community, nor should it. What happens
>> if a 0-day cross-site-scripting attack is found and user passwords are
>> stolen?
>>
>> This is really so far out of scope and mislayered, that it's very much a
>> disservice to the few users who might actually use the facility. Why are we
>> building a web technology security validator into clang that is insecure?
>> That's a separate project.
>>
>> Ordinarily you pipe tool output through a well-maintained and up-to-date
>> script that knows about browser and JavaScript quirks. Can we please just
>> point users to that workflow and get on with things?
>
> Parsing Doxygen is inherently intertwined with HTML parsing and
> semantic analysis.  Doing filtering at the same level does not look
> out of scope and mislayered.

There are no 3rd party libraries or tools which already do this which
we could then rely on? If not, where do you see this code living
within the overall structure of the compiler? Will it continue to be a
part of clangAST like the other comment-related code?

~Aaron