[LLVMbugs] [Bug 22067] New: Characters not in the basic character set aren't converted to \uXXXX form correctly

bugzilla-daemon at llvm.org bugzilla-daemon at llvm.org
Tue Dec 30 13:49:51 PST 2014


            Bug ID: 22067
           Summary: Characters not in the basic character set aren't
                    converted to \uXXXX form correctly
           Product: clang
           Version: unspecified
          Hardware: PC
                OS: All
            Status: NEW
          Severity: normal
          Priority: P
         Component: C++11
          Assignee: unassignedclangbugs at nondot.org
          Reporter: nicolasweber at gmx.de
                CC: dgregor at apple.com, llvmbugs at cs.uiuc.edu
    Classification: Unclassified


#include <stdio.h>
#define RAW(x) R##x
const char c[] = RAW("(ü\n)");
int main() {

$ bin/clang -o foo test2.cc -isysroot $(xcrun -show-sdk-path) -std=c++11
$ ./foo

[lex.phases] says

Any source file character not in the basic
source character set (2.3) is replaced by the universal-character-name that
designates that character.

[lex.pptoken]p3 says that

If the input stream has been parsed into preprocessing tokens up to a given
— If the next character begins a sequence of characters that could be the
prefix and initial double quote of
a raw string literal, such as R", the next preprocessing token shall be a raw
string literal. Between the
initial and final double quote characters of the raw string, any
transformations performed in phases 1
and 2 (trigraphs, universal-character-names, and line splicing) are reverted;

Richard explains that the pp tokens here prevent the raw string token reversion
of phases 1 and 2, so shouldn't the output be



You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-bugs/attachments/20141230/1290c657/attachment.html>

More information about the llvm-bugs mailing list