[cfe-commits] [PATCH] Comment parsing: resolve HTML character references (e.g., & -> &)

Jordan Rose jordan_rose at apple.com
Wed Jul 25 15:05:26 PDT 2012


On Jul 25, 2012, at 14:59 , Dmitri Gribenko <gribozavr at gmail.com> wrote:

> On Wed, Jul 25, 2012 at 2:56 PM, Jordan Rose <jordan_rose at apple.com> wrote:
>> 
>> On Jul 25, 2012, at 14:54 , Dmitri Gribenko <gribozavr at gmail.com> wrote:
>> 
>>> On Wed, Jul 25, 2012 at 2:51 PM, Jordan Rose <jordan_rose at apple.com> wrote:
>>>> This seems like a very bad idea when I have this in a comment:
>>>> 
>>>> <em>0<i</em>
>>>> 
>>>> If you expand the '<', you end up with invalid HTML. Entities are
>>>> supposed to be entities when they come out the other end.
>>> 
>>> '<' will be expanded in the internal representation.  HTML renderer
>>> will escape HTML special characters back.
>> 
>> …as long as my test case is emitted unchanged, I don't mind, but I think it's non-trivial to expand entities in "<em>0<i</em>" and keep track of which "<" are supposed to be escaped.
> 
> '<em>' is a separate AST node and "0<i" is (three) plain text nodes,
> so it is actually simple.
> 
> Added your example to tests.

Oh right. Forgot you were already lexing HTML. Okay, once again sorry for the noise.



More information about the cfe-commits mailing list