[PATCH] D104975: Implement P1949

Richard Smith - zygoloid via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Fri Aug 13 17:00:14 PDT 2021


rsmith added inline comments.


================
Comment at: clang/include/clang/Basic/DiagnosticLexKinds.td:117
+def err_character_not_allowed : Error<
+  "character <U+%0> not allowed in identifiers">;
 def ext_unicode_whitespace : ExtWarn<
----------------
The old diagnostic text here was bad in the case where the character was supposed to be part of an identifier. The new diagnostic text is bad in the case where the character is not supposed to be part of an identifier (eg, if you copy-paste a smart-quote into the source file). Is there some way we can phrase this diagnostic so it's reasonable regardless of the programmer's intent?

Perhaps the best we can do is to say that if an identifier is immediately followed by one or more "bad" Unicode characters that were probably intended to be identifier characters (things we don't recognize as whitespace or homoglyphs or anything else), then produce the nice "not allowed in identifiers" diagnostic (and maybe even treat the characters as part of the identifier for error recovery purposes), and otherwise diagnose as simply "unexpected character <U+%0>"? I also wonder if perhaps we should treat all unexpected characters as identifier characters for error recovery purposes.


================
Comment at: clang/lib/Lex/Lexer.cpp:3182
+    }
+    Diag(BufferPtr, diag::err_character_not_allowed)
+        << CharBuf
----------------
It's a bit strange that we produce this warning only on characters that aren't allowed in identifiers, and don't warn on identifier continuation characters that aren't valid identifier start characters (such as a standalone combining character). Is there a good reason for that (specifically for the `!isAllowedIDChar` check above)?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D104975/new/

https://reviews.llvm.org/D104975



More information about the llvm-commits mailing list