[cfe-dev] [Review Request] char16_t and char32_t character literals
Yusaku Shiga
yusaku.shiga at gmail.com
Sun May 22 06:10:37 PDT 2011
Hi, all,
I wrote a patch to implement new char16_t and char32_t character literals
introduced in C++0x.
My implementation is based on 2.13.3 Character literals [lex.ccon], C++0x
draft N3242.
Could you review my patch and feed me back your comments and requests?
* New features
Compiling sources with -std=c++0x option, clang accepts the char16_t and
char32_t character
literals.
At this point, there is a limitation. See also (1) in the TODO list below.
* Source Example
char16_t a = u'a'; // u'a' has char16_t type. The value is \u0061 in
UTF16.
char32_t b = U'b'; // U'b' has char32_t type, The value is \U00000062 in
UTF32.
* Implementation
I added 3 token kinds, tok::wchar_constnat, tok::utf16char_constant,
tok::utf32char_constant.
( include/clang/Basic/TokenKinds.def)
If the Lexer finds the token starting with L', u' or U' , it try to
construct a char literal token
whose token kinds are tok::wchar_constant, tok::utf16char_constant, or
tok::utf32char_constant.
(Lexer::LexTokenInternal(Token& Result), lib/Lex/Lexer.cpp)
To make it easy to set the proper token kind to the char literal token
object, I modified the class
CharLiteralParser so that it has a private member to hold a token kind, and
append an argument
to the ctor to take the literal kind.
(include/clang/Lex/LiteralSupport.h, lib/Lex/LiteralSupport.h)
The parser and Sema set appropriate type for wchar_t, char16_t and char32_t
literal.
(Parser:::ParseCastExpression, lib/Parse/ParseExpr.cpp
* TODO
(1) No Code Conversion.
At this point, only ascii characters are available in the char16_t and
char32_t constants because
I have not implemented code conversion logic. I plan to fix the problem in
next patch to support
chart16_t and char32_t string literals.
(2) Code Gen Problem.
When defining an array whose type of char32_t (4byte aligned type), the
array is aligned to 16 byte
boundary instead of 4byte. I think this is a bug in clang. A test program in
my patch,
test/CodeGenCXX/cxx0x-char-literal.cpp, line 33 and 34, demonstrates the
problem.
* Test Environment and Result
My patch based on r131788.
And I checked it on Fedora14 x86_64.
There is no degradation.
But a problems is found as described in TODO list.
-----
Yusaku Shiga
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20110522/f7c42af2/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: cxx0x-char-literal.r131788.patch
Type: application/octet-stream
Size: 18088 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20110522/f7c42af2/attachment.obj>
More information about the cfe-dev
mailing list