r206882 - Comment parsing: in the generated XML file, mark HTML that is safe to pass
Alp Toker
alp at nuanti.com
Tue Apr 22 09:14:03 PDT 2014
On 22/04/2014 13:35, Dmitri Gribenko wrote:
> On Tue, Apr 22, 2014 at 12:45 PM, Alp Toker <alp at nuanti.com> wrote:
>> On 22/04/2014 11:59, Dmitri Gribenko wrote:
>>> +// The list is intentionally organized as one item per line to make it
>>> easier
>>> +// to compare with the HTML spec.
>>> +foreach AttrName = [
>>> + "onabort",
>>> + "onblur",
>>> + "oncancel",
>>> + "oncanplay",
>>> + "oncanplaythrough",
>>> + "onchange",
>>> + "onclick",
>>> + "onclose",
>>> + "oncuechange",
>>> + "ondblclick",
>>> + "ondrag",
>>> + "ondragend",
>>> + "ondragenter",
>>> + "ondragexit",
>>> + "ondragleave",
>>> + "ondragover",
>>> + "ondragstart",
>>> + "ondrop",
>>> + "ondurationchange",
>>> + "onemptied",
>>> + "onended",
>>> + "onerror",
>>> + "onfocus",
>>> + "oninput",
>>> + "oninvalid",
>>> + "onkeydown",
>>> + "onkeypress",
>>> + "onkeyup",
>>> + "onload",
>>> + "onloadeddata",
>>> + "onloadedmetadata",
>>> + "onloadstart",
>>> + "onmousedown",
>>> + "onmouseenter",
>>> + "onmouseleave",
>>> + "onmousemove",
>>> + "onmouseout",
>>> + "onmouseover",
>>> + "onmouseup",
>>> + "onmousewheel",
>>> + "onpause",
>>> + "onplay",
>>> + "onplaying",
>>> + "onprogress",
>>> + "onratechange",
>>> + "onreset",
>>> + "onresize",
>>> + "onscroll",
>>> + "onseeked",
>>> + "onseeking",
>>> + "onselect",
>>> + "onshow",
>>> + "onstalled",
>>> + "onsubmit",
>>> + "onsuspend",
>>> + "ontimeupdate",
>>> + "ontoggle",
>>> + "onvolumechange",
>>> + "onwaiting"
>>> + ] in {
>>> + def Attr#AttrName : EventHandlerContentAttribute<AttrName>;
>>> +}
>>
>> Hi Dmitri,
>>
>> I've been meaning to bring this up earlier but I really don't think this
>> kind of functionality belongs in the core of the clang AST/Basic.
>>
>> We often ask contributors to add even useful features like warnings and
>> checks to clang-tools-extra or third-party plugins where it might keep
>> things trim. So it's surprising to me that at the same time there's a
>> full-featured HTML5 generator and XML parser/tree being incrementally
>> hard-coded into clang/AST.
> Hi Alp,
>
> Please note that HTML and XML *generation* is in libIndex, not in
> libAST. Also, no piece of comment parsing (except for the diagnostic
> .td file) lives in libBasic.
>
> Comment parsing, including semantic analysis for HTML embedded in
> comments, is, indeed, in libAST. Actually, at first, I put it into
> libComment library, because I had the same feeling -- it did not
> belong into libAST, but I had to merge it into libAST because of a
> dependency cycle: comment parsing depends on libAST in order to
> inspect declarations, and libAST (specifically, ASTContext) depends on
> libComment in order to parse comments.
>
> If you do feel strongly about this, we could bounce off an abstract
> base class that is declared in libAST and the only implementation of
> which lives in libComment. Then libComment would depend on libAST,
> and something in libFrontend, which would depend on both libAST and
> libComment, would instantiate the comment parser appropriately and
> configure ASTContext with the instance. But I don't think that it is
> a significant cleanup of libAST, it introduced even more complexity.
> All of lib/AST/Comment* is just 3.5 KLoC, include/clang/AST/Comment*
> (including all supporting tables) is less than 3 KLoC. Compared to
> libAST as a whole, with over 100 KLoC, I don't think it adds much
> value to factor this small piece into a separate library.
Thanks for writing that up Dmitri.
I believe we can get a lot of mileage with some changes to internals.
I've been studying what's there and have some bite-sized suggestions:
1) RawComment::extractBriefText() introduces a dependency from the
lightweight comment handling into the full comment AST. Could you
extract this to a function like extractBriefText(RawComment&, ASTContext
&Context)?
2) Split out comment AST C API in libclang from Index.h into
CXComment.h. For now, include CXComment.h but this makes it easier to
develop alternatives (and easier for libclang users to understand the
separation of concerns).
(... and more items like, you get the idea)
By the way, my numbers don't agree with yours. I have a patch* here to
remove the doc comment functionality with all other tests passing:
138 files changed, 655 insertions(+), 17760 deletions(-)
That's about one fifth the size of the whole libAST to use your
benchmark :-)
While I'm not suggesting we actually apply the patch, it's a great way
to view the comment system as a whole and identify possible simplifications.
So, if we shake things up a little we can reuse most of this work while
addressing some of the issues. I'd bee keen to help out on this and have
patches.
Ultimately I believe the Actions and AST style approach needs to go away
in favour of something more direct and appropriate for parsing comments
that can be run after initial AST generation / compilation.
(The patch against r206882 was too big for the list, will look into
uploading it.)
Alp.
--
http://www.nuanti.com
the browser experts
More information about the cfe-commits
mailing list