r206882 - Comment parsing: in the generated XML file, mark HTML that is safe to pass

Alp Toker alp at nuanti.com
Tue Apr 22 09:14:03 PDT 2014


On 22/04/2014 13:35, Dmitri Gribenko wrote:
> On Tue, Apr 22, 2014 at 12:45 PM, Alp Toker <alp at nuanti.com> wrote:
>> On 22/04/2014 11:59, Dmitri Gribenko wrote:
>>> +// The list is intentionally organized as one item per line to make it
>>> easier
>>> +// to compare with the HTML spec.
>>> +foreach AttrName = [
>>> +    "onabort",
>>> +    "onblur",
>>> +    "oncancel",
>>> +    "oncanplay",
>>> +    "oncanplaythrough",
>>> +    "onchange",
>>> +    "onclick",
>>> +    "onclose",
>>> +    "oncuechange",
>>> +    "ondblclick",
>>> +    "ondrag",
>>> +    "ondragend",
>>> +    "ondragenter",
>>> +    "ondragexit",
>>> +    "ondragleave",
>>> +    "ondragover",
>>> +    "ondragstart",
>>> +    "ondrop",
>>> +    "ondurationchange",
>>> +    "onemptied",
>>> +    "onended",
>>> +    "onerror",
>>> +    "onfocus",
>>> +    "oninput",
>>> +    "oninvalid",
>>> +    "onkeydown",
>>> +    "onkeypress",
>>> +    "onkeyup",
>>> +    "onload",
>>> +    "onloadeddata",
>>> +    "onloadedmetadata",
>>> +    "onloadstart",
>>> +    "onmousedown",
>>> +    "onmouseenter",
>>> +    "onmouseleave",
>>> +    "onmousemove",
>>> +    "onmouseout",
>>> +    "onmouseover",
>>> +    "onmouseup",
>>> +    "onmousewheel",
>>> +    "onpause",
>>> +    "onplay",
>>> +    "onplaying",
>>> +    "onprogress",
>>> +    "onratechange",
>>> +    "onreset",
>>> +    "onresize",
>>> +    "onscroll",
>>> +    "onseeked",
>>> +    "onseeking",
>>> +    "onselect",
>>> +    "onshow",
>>> +    "onstalled",
>>> +    "onsubmit",
>>> +    "onsuspend",
>>> +    "ontimeupdate",
>>> +    "ontoggle",
>>> +    "onvolumechange",
>>> +    "onwaiting"
>>> +  ] in {
>>> +  def Attr#AttrName : EventHandlerContentAttribute<AttrName>;
>>> +}
>>
>> Hi Dmitri,
>>
>> I've been meaning to bring this up earlier but I really don't think this
>> kind of functionality belongs in the core of the clang AST/Basic.
>>
>> We often ask contributors to add even useful features like warnings and
>> checks to clang-tools-extra or third-party plugins where it might keep
>> things trim. So it's surprising to me that at the same time there's a
>> full-featured HTML5 generator and XML parser/tree being incrementally
>> hard-coded into clang/AST.
> Hi Alp,
>
> Please note that HTML and XML *generation* is in libIndex, not in
> libAST.  Also, no piece of comment parsing (except for the diagnostic
> .td file) lives in libBasic.
>
> Comment parsing, including semantic analysis for HTML embedded in
> comments, is, indeed, in libAST.  Actually, at first, I put it into
> libComment library, because I had the same feeling -- it did not
> belong into libAST, but I had to merge it into libAST because of a
> dependency cycle: comment parsing depends on libAST in order to
> inspect declarations, and libAST (specifically, ASTContext) depends on
> libComment in order to parse comments.
>
> If you do feel strongly about this, we could bounce off an abstract
> base class that is declared in libAST and the only implementation of
> which lives in libComment.  Then libComment would depend on libAST,
> and something in libFrontend, which would depend on both libAST and
> libComment, would instantiate the comment parser appropriately and
> configure ASTContext with the instance.  But I don't think that it is
> a significant cleanup of libAST, it introduced even more complexity.
> All of lib/AST/Comment* is just 3.5 KLoC, include/clang/AST/Comment*
> (including all supporting tables) is less than 3 KLoC.  Compared to
> libAST as a whole, with over 100 KLoC, I don't think it adds much
> value to factor this small piece into a separate library.

Thanks for writing that up Dmitri.

I believe we can get a lot of mileage with some changes to internals.

I've been studying what's there and have some bite-sized suggestions:

1) RawComment::extractBriefText() introduces a dependency from the 
lightweight comment handling into the full comment AST. Could you 
extract this to a function like extractBriefText(RawComment&, ASTContext 
&Context)?

2) Split out comment AST C API in libclang from Index.h into 
CXComment.h. For now, include CXComment.h but this makes it easier to 
develop alternatives (and easier for libclang users to understand the 
separation of concerns).

(... and more items like, you get the idea)

By the way, my numbers don't agree with yours. I have a patch* here to 
remove the doc comment functionality with all other tests passing:

   138 files changed, 655 insertions(+), 17760 deletions(-)

That's about one fifth the size of the whole libAST to use your 
benchmark :-)

While I'm not suggesting we actually apply the patch, it's a great way 
to view the comment system as a whole and identify possible simplifications.

So, if we shake things up a little we can reuse most of this work while 
addressing some of the issues. I'd bee keen to help out on this and have 
patches.

Ultimately I believe the Actions and AST style approach needs to go away 
in favour of something more direct and appropriate for parsing comments 
that can be run after initial AST generation / compilation.

(The patch against r206882 was too big for the list, will look into 
uploading it.)

Alp.


-- 
http://www.nuanti.com
the browser experts





More information about the cfe-commits mailing list