[PATCH] D108742: [WIP] Reclassify form-feed and vertical tab as vertical WS for the purposes of lexing.

Corentin Jabot via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Sat Aug 28 04:03:57 PDT 2021


cor3ntin added a comment.

In D108742#2970302 <https://reviews.llvm.org/D108742#2970302>, @rsmith wrote:

> In D108742#2970283 <https://reviews.llvm.org/D108742#2970283>, @cor3ntin wrote:
>
>>> Drive-by observation: under P2348 <https://reviews.llvm.org/P2348>, Clang's behavior of treating `\n\r` as a single new-line would be "non-standard" (requiring special phase 1 mapping). Is that intentional? `\n\r` is used as a new-line character on old Mac systems.
>>
>> Somewhat. `\n\r` is not described by Unicode so we could either mandate that all implementation support that or leave it as implementation-defined mapping. Correct me if I am wrong, but as the line number is itself implementation-defined, whether there are one or 2 line breaks would not materially affect the standard, either way.
>
> Yes, I suppose that's true. Though if we're nailing down exactly how new lines are defined and asking every conforming implementation to support UTF-8 and such, maybe it's time to also define how the presumed line number is determined? =)

I'm not sure that we want people to rely on the value on __LINE__, to be honest.

Anyway, I've been thinking a lot about that and I don't think this change is worth pursuing. I've realized all compilers treat VT and FF as non-line breaking horizontal whitespace and even if that's not consistent with Unicode it's also consistent with other programming languages.
I'll modify the paper to keep treating these codepoints as horizontal whitespaces as this seems the sane, non-disruptive course of action.

I will also try to address your feedback regarding acorn systems on the paper.

Thanks!


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D108742/new/

https://reviews.llvm.org/D108742



More information about the cfe-commits mailing list