[PATCH] D104975: Implement P1949

Corentin Jabot via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Sat Aug 14 04:32:05 PDT 2021


cor3ntin marked 2 inline comments as done.
cor3ntin added inline comments.


================
Comment at: clang/include/clang/Basic/DiagnosticLexKinds.td:117
+def err_character_not_allowed : Error<
+  "character <U+%0> not allowed in identifiers">;
 def ext_unicode_whitespace : ExtWarn<
----------------
rsmith wrote:
> The old diagnostic text here was bad in the case where the character was supposed to be part of an identifier. The new diagnostic text is bad in the case where the character is not supposed to be part of an identifier (eg, if you copy-paste a smart-quote into the source file). Is there some way we can phrase this diagnostic so it's reasonable regardless of the programmer's intent?
> 
> Perhaps the best we can do is to say that if an identifier is immediately followed by one or more "bad" Unicode characters that were probably intended to be identifier characters (things we don't recognize as whitespace or homoglyphs or anything else), then produce the nice "not allowed in identifiers" diagnostic (and maybe even treat the characters as part of the identifier for error recovery purposes), and otherwise diagnose as simply "unexpected character <U+%0>"? I also wonder if perhaps we should treat all unexpected characters as identifier characters for error recovery purposes.
I changed the PR so that, at the start of an identifier, if the Unicode character is neither a whitespace, identifier start or identifier continue,
we display `unexpected character <U+%0>`.
It's not perfect because we make no effort to distinguish pp-numbers from identifiers, but it seems good enough! Certainly an improvement :)

I also recover nicely for non-leading invalid Unicode codepoints. I am not sure we should do that for the leading case though.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D104975/new/

https://reviews.llvm.org/D104975



More information about the llvm-commits mailing list