[cfe-dev] Implementing charsets (-fexec-charset & -finput-charset)

Friedman, Eli via cfe-dev cfe-dev at lists.llvm.org
Tue Jan 30 11:47:39 PST 2018


On 1/30/2018 8:18 AM, Sean Perry wrote:
>
> clang and llvm aren't performing any conversion right now. Everything 
> assumes the input, output and exec charsets are UTF-8. One user 
> scenario I am trying to enable is the input charset being EBCDIC for a 
> system where EBCDIC is the charset. Doing this is non-trivial and 
> exposes the issues I outlined below and most likely more (eg. debug info).
>

Please don't mix together the issues of compiling for an EBCDIC target 
and running LLVM on an EBCDIC host.  I understand it's kind of tied 
together from your perspective (since the end result you want is a 
native compiler which runs on an EBCDIC target), but LLVM is always 
built as a cross-compiler, so we need to consider them separately to get 
a reasonable result.

If you're cross-compiling UTF-8-encoded source code on a UTF-8 host to a 
EBCDIC target, you need conversions in a few places in clang: 
specifically, symbol names need to be translated when IR is generated, 
and string/character literals need to be translated by the lexer.  And 
the LLVM backend might also need to convert certain strings which are 
emitted into object files.

If you're cross-compiling EBCDIC-encoded source code on a UTF-8 host to 
a UTF-8 target, you need a conversion in exactly one place; the input 
source code needs to be converted to UTF-8, once.

If you're cross-compiling EBCDIC-encoded source code on a UTF-8 host to 
a EBCDIC target, you need both of the above conversions.

If you're compiling LLVM/clang for an EBCDIC host, everything becomes 
complicated because both LLVM and clang assume they're running in a 
ASCII-compatible locale; the issues you're describing are primarily 
related to this.  You probably want to leave this for last because a lot 
of the changes involved will be controversial, and it'll be easier to 
convince everyone it's useful if you have a usable target.

-Eli

-- 
Employee of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20180130/c8aca883/attachment.html>


More information about the cfe-dev mailing list