[LLVMbugs] [Bug 18535] New: Clang++ accepts invalid Unicode character literals

Sat Jan 18 15:37:50 PST 2014

http://llvm.org/bugs/show_bug.cgi?id=18535

            Bug ID: 18535
           Summary: Clang++ accepts invalid Unicode character literals
           Product: clang
           Version: trunk
          Hardware: PC
                OS: Linux
            Status: NEW
          Severity: normal
          Priority: P
         Component: C++11
          Assignee: unassignedclangbugs at nondot.org
          Reporter: wjl at icecavern.net
                CC: dgregor at apple.com, llvmbugs at cs.uiuc.edu
    Classification: Unclassified

Clang++ accepts invalid Unicode character literals, such as U'\u0000'.

I found this bug when running across this issue in GCC:

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59873

In that bug, I thought gcc was in error because it was munging my U'\u0000'
into the value 1 (which admittedly is BIZARRE), whereas Clang treated it with
the value I expected, which is 0.

However, it was pointed out in that bug that such a Unicode literal is
apparently invalid. This is quoted in the gcc source code in libcpp/charset.c,
apparently from the C99 standard (and I assume -- hopefully correctly -- that
this applies to C++11):

   C99 6.4.3: A universal character name shall not specify a character
   whose short identifier is less than 00A0 other than 0024 ($), 0040 (@),
   or 0060 (`), nor one in the range D800 through DFFF inclusive.

Currently clang (and gcc) yield a compiler error if you try to use something
like U'\ud800' because it is a surrogate. However, *all* other literals work
(as posted in the gcc bug, I generated a program (17 MiB of source code) which
tests every possible Unicode literal, and they all are accepted and give the
right numeric value on clang, except for surrogates which are rejected.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-bugs/attachments/20140118/eedd14ed/attachment.html>