[llvm-bugs] [Bug 41536] New: [clang-cl] incorrectly encodes ordinary string literals containing universal-character-names in UTF-8
via llvm-bugs
llvm-bugs at lists.llvm.org
Thu Apr 18 19:35:27 PDT 2019
https://bugs.llvm.org/show_bug.cgi?id=41536
Bug ID: 41536
Summary: [clang-cl] incorrectly encodes ordinary string
literals containing universal-character-names in UTF-8
Product: clang
Version: 8.0
Hardware: PC
OS: Windows NT
Status: NEW
Severity: enhancement
Priority: P
Component: C++
Assignee: unassignedclangbugs at nondot.org
Reporter: Casey at Carter.net
CC: blitzrakete at gmail.com, dgregor at apple.com,
erik.pilkington at gmail.com, llvm-bugs at lists.llvm.org,
richard-llvm at metafoo.co.uk
Compiling this program:
extern const char str[] = "\u0020\u00f4\u00e2";
int main() {
static_assert(sizeof(str) == 4, "BOOM");
static_assert(sizeof(str) != 6, "BOOM");
}
with "cl /FA /c" (presumably any version; notably the 19.20 release on Compiler
Explorar) succeeds, and produces assembly output containing the line:
?str@@3QBDB DB ' ', 0f4H, 0e2H, 00H ; str
note the three universal-character-names (UCNs) have been replaced with the
appropriate corresponding WINDOWS-1252 encodings,which happen to use the same
code unit values as Unicode, as required by [lex.ccon]/9.
Compiling the same program with "clang-cl /FA /c" does not succeed: both
static_asserts fire. Commenting them out, the produced assembly contains the
line:
.asciz " \303\264\303\242"
which, despite using superior AT&T syntax, is substantially different. The UCNs
are now encoded in UTF-8, which will produce mojibake when output to a console
expecting WINDOWS-1252. \u0080 is another good example; cl rejects it since
U+0080 has no representation in WINDOWS-1252 whereas clang-cl encodes U+0080 in
UTF-8 without complaint.
--
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-bugs/attachments/20190419/adde6831/attachment.html>
More information about the llvm-bugs
mailing list