[LLVMbugs] [Bug 18535] New: Clang++ accepts invalid Unicode character literals
bugzilla-daemon at llvm.org
bugzilla-daemon at llvm.org
Sat Jan 18 15:37:50 PST 2014
http://llvm.org/bugs/show_bug.cgi?id=18535
Bug ID: 18535
Summary: Clang++ accepts invalid Unicode character literals
Product: clang
Version: trunk
Hardware: PC
OS: Linux
Status: NEW
Severity: normal
Priority: P
Component: C++11
Assignee: unassignedclangbugs at nondot.org
Reporter: wjl at icecavern.net
CC: dgregor at apple.com, llvmbugs at cs.uiuc.edu
Classification: Unclassified
Clang++ accepts invalid Unicode character literals, such as U'\u0000'.
I found this bug when running across this issue in GCC:
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59873
In that bug, I thought gcc was in error because it was munging my U'\u0000'
into the value 1 (which admittedly is BIZARRE), whereas Clang treated it with
the value I expected, which is 0.
However, it was pointed out in that bug that such a Unicode literal is
apparently invalid. This is quoted in the gcc source code in libcpp/charset.c,
apparently from the C99 standard (and I assume -- hopefully correctly -- that
this applies to C++11):
C99 6.4.3: A universal character name shall not specify a character
whose short identifier is less than 00A0 other than 0024 ($), 0040 (@),
or 0060 (`), nor one in the range D800 through DFFF inclusive.
Currently clang (and gcc) yield a compiler error if you try to use something
like U'\ud800' because it is a surrogate. However, *all* other literals work
(as posted in the gcc bug, I generated a program (17 MiB of source code) which
tests every possible Unicode literal, and they all are accepted and give the
right numeric value on clang, except for surrogates which are rejected.
--
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-bugs/attachments/20140118/eedd14ed/attachment.html>
More information about the llvm-bugs
mailing list