[cfe-dev] [Review Request] char16_t and char32_t character literals

Yusaku Shiga yusaku.shiga at gmail.com
Sun May 22 06:10:37 PDT 2011


Hi, all,

I wrote a patch to implement new char16_t and char32_t character literals
introduced in C++0x.
My implementation is based on 2.13.3 Character literals [lex.ccon], C++0x
draft N3242.
Could you review my patch and feed me back your comments and requests?

* New features

Compiling sources with -std=c++0x option, clang accepts the char16_t and
char32_t character
literals.

At this point, there is a limitation. See also (1) in the TODO list below.

* Source Example

char16_t a = u'a';   // u'a'  has char16_t type. The value is \u0061 in
UTF16.
char32_t b = U'b';  // U'b'  has char32_t type, The value  is \U00000062 in
UTF32.


* Implementation

I added 3 token kinds, tok::wchar_constnat, tok::utf16char_constant,
tok::utf32char_constant.
( include/clang/Basic/TokenKinds.def)

If the Lexer finds the token starting with L', u' or U' , it try to
construct a char literal token
whose token kinds are tok::wchar_constant, tok::utf16char_constant, or
tok::utf32char_constant.
(Lexer::LexTokenInternal(Token& Result),  lib/Lex/Lexer.cpp)

To make it easy to set the proper token kind to the char literal token
object, I modified the class
CharLiteralParser so that it has a private member to hold a token kind, and
append an argument
to the ctor to take the literal kind.
(include/clang/Lex/LiteralSupport.h, lib/Lex/LiteralSupport.h)

The parser and Sema set appropriate type for wchar_t, char16_t and char32_t
literal.
(Parser:::ParseCastExpression,  lib/Parse/ParseExpr.cpp


* TODO

(1) No Code Conversion.
At this point, only ascii characters are available in the char16_t and
char32_t constants because
I have not implemented code conversion logic. I plan to fix the problem in
next patch to support
chart16_t and char32_t string literals.

(2) Code Gen Problem.
When defining an array whose type of char32_t (4byte aligned type), the
array is aligned to 16 byte
boundary instead of 4byte. I think this is a bug in clang. A test program in
my patch,
test/CodeGenCXX/cxx0x-char-literal.cpp, line 33 and 34, demonstrates the
problem.


* Test Environment and Result
My patch based on r131788.
And I checked it on Fedora14 x86_64.

There is no degradation.
But a problems is found as described in TODO list.

-----
Yusaku Shiga
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20110522/f7c42af2/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: cxx0x-char-literal.r131788.patch
Type: application/octet-stream
Size: 18088 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20110522/f7c42af2/attachment.obj>


More information about the cfe-dev mailing list