[cfe-dev] Implementing charsets (-fexec-charset & -finput-charset)
Friedman, Eli via cfe-dev
cfe-dev at lists.llvm.org
Tue Jan 30 11:47:39 PST 2018
On 1/30/2018 8:18 AM, Sean Perry wrote:
>
> clang and llvm aren't performing any conversion right now. Everything
> assumes the input, output and exec charsets are UTF-8. One user
> scenario I am trying to enable is the input charset being EBCDIC for a
> system where EBCDIC is the charset. Doing this is non-trivial and
> exposes the issues I outlined below and most likely more (eg. debug info).
>
Please don't mix together the issues of compiling for an EBCDIC target
and running LLVM on an EBCDIC host. I understand it's kind of tied
together from your perspective (since the end result you want is a
native compiler which runs on an EBCDIC target), but LLVM is always
built as a cross-compiler, so we need to consider them separately to get
a reasonable result.
If you're cross-compiling UTF-8-encoded source code on a UTF-8 host to a
EBCDIC target, you need conversions in a few places in clang:
specifically, symbol names need to be translated when IR is generated,
and string/character literals need to be translated by the lexer. And
the LLVM backend might also need to convert certain strings which are
emitted into object files.
If you're cross-compiling EBCDIC-encoded source code on a UTF-8 host to
a UTF-8 target, you need a conversion in exactly one place; the input
source code needs to be converted to UTF-8, once.
If you're cross-compiling EBCDIC-encoded source code on a UTF-8 host to
a EBCDIC target, you need both of the above conversions.
If you're compiling LLVM/clang for an EBCDIC host, everything becomes
complicated because both LLVM and clang assume they're running in a
ASCII-compatible locale; the issues you're describing are primarily
related to this. You probably want to leave this for last because a lot
of the changes involved will be controversial, and it'll be easier to
convince everyone it's useful if you have a usable target.
-Eli
--
Employee of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20180130/c8aca883/attachment.html>
More information about the cfe-dev
mailing list