[cfe-dev] gcc switch -fexec-charset=IBM-1047 to generate EBCDIC character constants

Sun Apr 13 18:33:25 PDT 2014

On Fri, Apr 11, 2014 at 2:21 AM, John P. Hartmann <jphartmann at gmail.com>wrote:

> This append is related to integers being expressed as character
> constants, e.g., 'a'.  Strings, e.g., "a" are not an issue in the
> assembler code produced.
>
> Consider a trivial program:
>
>    int main(void) {char a='a';return a;}
>
> Compiling with
>
>    clang -target s390x-linux-gnu -S -D__x86_64__ -O2 test.c
>
> Gets me this assembler:
>
> main:
>         lghi    %r2, 97
>         br      %r14
>
> Which is all correct as z/Linux is an ASCII operating system.
>
> However, there are other operating systems for IBM's z/Architecture that
> use the EBCDIC encoding, and there one wants 0x81 for 'a'.
>
> If the constant 'a' could pass through the compilation system (even if
> the assembler does not support such constants), I would have less than a
> smop;

... what's a smop?

> as it is now, the code generated is indistinguishable from a =
> 0x61, which should not be converted to EBCDIC.
>
> Any pointer to where I should start hacking would be greatly appreciated.
>
> An alternative would be a target specification triple, e.g.,
> s390-zvm-cms, but one would still wish to specify the target code page
> as Germans are likely to want a different one from the French.  And
> presumably that also means a new back end (?)
>
> Finally, the gcc implementation is not optimal because the conversion is
> also applied to strings, in particular the ones in printf() and that
> severely messes up the checking as the EBCDIC string is scanned for
> ASCII %, which is not helpful.

Wait, you want for character literals and string literals to use a
different encoding? That sounds like a phenomenally bad idea to me. Also,
if your printf assumes ASCII, it sounds like your implementation's
execution character set really is ASCII...

Supporting execution character sets that are not ASCII is probably not too
burdensome, but if we're going to do it, we should do it right (using the
same character set for all character and string literals with no
encoding-prefix). If you want to go ahead with that, start by looking at
lib/Lex/LiteralSupport.cpp.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20140413/8c56296a/attachment.html>