[cfe-commits] [patch] Unicode character literals for UTF-8 source encoding

Eli Friedman eli.friedman at gmail.com
Mon Jan 9 20:56:57 PST 2012


On Mon, Jan 9, 2012 at 8:05 PM, Seth Cantrell <seth.cantrell at gmail.com> wrote:
> Updated patches. There's an extra one for the change to ActOnCharacterConstant.
>

+  // FIXME: unify the logic for determining the type of the char literal
+  //  instead of repeating it here and in ActOnCharacterConstant
+  int available_bits;
+  if (tok::wide_char_constant == Kind)
+    available_bits = PP.getTargetInfo().getWCharWidth();
+  else if (tok::utf16_char_constant == Kind)
+    available_bits = PP.getTargetInfo().getChar16Width();
+  else if (tok::utf32_char_constant == Kind)
+    available_bits = PP.getTargetInfo().getChar32Width();
+  else if (!PP.getLangOptions().CPlusPlus || isMultiChar())
+    available_bits = PP.getTargetInfo().getIntWidth();
+  else
+    available_bits = PP.getTargetInfo().getCharWidth();

Actually, thinking about it a bit more, I'm still not sure this is
actually what we want to do; do we really want to allow '\U0010FFFD'
in C?  I mean, strictly speaking, it's implementation-defined, but I
don't think there's any precedent for the value we use with this
patch.

-Eli



More information about the cfe-commits mailing list