[LLVMbugs] [Bug 24342] New: std::char_traits<char16_t>::eof() returns valid code unit

bugzilla-daemon at llvm.org bugzilla-daemon at llvm.org
Mon Aug 3 08:16:26 PDT 2015


https://llvm.org/bugs/show_bug.cgi?id=24342

            Bug ID: 24342
           Summary: std::char_traits<char16_t>::eof() returns valid code
                    unit
           Product: libc++
           Version: 3.6
          Hardware: Macintosh
                OS: All
            Status: NEW
          Severity: normal
          Priority: P
         Component: All Bugs
          Assignee: unassignedclangbugs at nondot.org
          Reporter: david_work at me.com
                CC: llvmbugs at cs.uiuc.edu, mclow.lists at gmail.com
    Classification: Unclassified

[char.traits.specializations.char16_t] ยง21.2.3.2/3 says,

"The member eof() shall return an implementation-defined constant that cannot
appear as a valid UTF-16 code unit."

In libc++ it returns 0xDFFF, which is a valid second half of a surrogate pair.
Surrogate pairs are only needed outside the basic multilingual plane, so it
won't often be seen, but characters like U+123FF are valid and encoded by
0xDFFF.

On the other hand, U+FFFF is a "noncharacter," "intended for process-internal
uses" similarly to the byte order mark (which happens to be the preceding code
point U+FFFE). (http://unicode.org/charts/PDF/UFFF0.pdf) U+FFFF is used by most
other environments, it is the value under libstdc++, and it coincides with WEOF
when wchar_t is UTF-16.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-bugs/attachments/20150803/96aa45e6/attachment.html>


More information about the llvm-bugs mailing list