[PATCH] D59765: [Lex] Warn about invisible Hangul whitespace

Brian Gesiak via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Mon Mar 25 06:38:53 PDT 2019


modocache created this revision.
modocache added reviewers: chandlerc, rsmith.
Herald added a subscriber: jdoerfert.
Herald added a project: clang.

On Twitter @LunarLambda pointed out that Clang allows Hangul whitespace Unicode
characters in identifiers, which allows users to write very confusing
programs: https://twitter.com/LunarLambda/status/1110097030423240705

Clang warns about similar whitespace Unicode characters. Add the Hangul
half-width and full-width whitespace characters to the set that Clang
warns about.

N.B.: Clang warns about Japanese space character `<U+3000>`, but in a
different way, because that character is not a valid identifier
character according to the C++11 standard. So Clang emits a warning that
it will treat the Japanese `<U+3000>` as whitespace. This is different
from the Korean Hangul whitespace character, which is a valid identifier
character according to the C++11 standard. For this reason, Clang warns
the character will be treated as an identifier character, not as a
whitecpace character -- so in sum, Clang's behavior is slightly
different for the Japanese whitespace character compared to the Korean
Hangul one.


Repository:
  rC Clang

https://reviews.llvm.org/D59765

Files:
  lib/Lex/Lexer.cpp
  test/Lexer/unicode.c


Index: test/Lexer/unicode.c
===================================================================
--- test/Lexer/unicode.c
+++ test/Lexer/unicode.c
@@ -39,10 +39,12 @@
                  // expected-warning at -1 {{treating Unicode character <U+037E> as identifier character rather than as ';' symbol}}
 int v=[=](auto){return~x;}(); // expected-warning 12{{treating Unicode character}}
 
-int ⁠xx‍;
+int ⁠xx‍xㅤᅠ;
 // expected-warning at -1 {{identifier contains Unicode character <U+2060> that is invisible in some environments}}
 // expected-warning at -2 {{identifier contains Unicode character <U+FEFF> that is invisible in some environments}}
 // expected-warning at -3 {{identifier contains Unicode character <U+200D> that is invisible in some environments}}
+// expected-warning at -4 {{identifier contains Unicode character <U+3164> that is invisible in some environments}}
+// expected-warning at -5 {{identifier contains Unicode character <U+FFA0> that is invisible in some environments}}
 int foo​bar = 0; // expected-warning {{identifier contains Unicode character <U+200B> that is invisible in some environments}}
 int x = foobar; // expected-error {{undeclared identifier}}
 
Index: lib/Lex/Lexer.cpp
===================================================================
--- lib/Lex/Lexer.cpp
+++ lib/Lex/Lexer.cpp
@@ -1528,6 +1528,7 @@
     {U'\u2227', '^'}, // LOGICAL AND
     {U'\u2236', ':'}, // RATIO
     {U'\u223c', '~'}, // TILDE OPERATOR
+    {U'\u3164', 0},   // HANGUL FILLER
     {U'\ua789', ':'}, // MODIFIER LETTER COLON
     {U'\ufeff', 0},   // ZERO WIDTH NO-BREAK SPACE
     {U'\uff01', '!'}, // FULLWIDTH EXCLAMATION MARK
@@ -1558,6 +1559,7 @@
     {U'\uff5c', '|'}, // FULLWIDTH VERTICAL LINE
     {U'\uff5d', '}'}, // FULLWIDTH RIGHT CURLY BRACKET
     {U'\uff5e', '~'}, // FULLWIDTH TILDE
+    {U'\uffa0', 0},   // HALFWIDTH HANGUL FILLER
     {0, 0}
   };
   auto Homoglyph =


-------------- next part --------------
A non-text attachment was scrubbed...
Name: D59765.192091.patch
Type: text/x-patch
Size: 1934 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/cfe-commits/attachments/20190325/8aba9d8a/attachment.bin>


More information about the cfe-commits mailing list