[llvm-dev] RFC: Adding support for the z/OS platform to LLVM and clang
Kai Peter Nacke via llvm-dev
llvm-dev at lists.llvm.org
Wed Jun 17 05:06:03 PDT 2020
Tom Honermann <Thomas.Honermann at synopsys.com> wrote on 16.06.2020
19:09:18:
> From: Tom Honermann <Thomas.Honermann at synopsys.com>
> To: Kai Peter Nacke <kai.nacke at de.ibm.com>
> Cc: Corentin <corentin.jabot at gmail.com>, "llvm-dev at lists.llvm.org"
> <llvm-dev at lists.llvm.org>
> Date: 16.06.2020 19:09
> Subject: [EXTERNAL] RE: [llvm-dev] RFC: Adding support for the z/OS
> platform to LLVM and clang
>
> > -----Original Message-----
> > From: Kai Peter Nacke <kai.nacke at de.ibm.com>
> > Sent: Tuesday, June 16, 2020 11:17 AM
> > To: Tom Honermann <thonerma at synopsys.com>
> > Cc: Corentin <corentin.jabot at gmail.com>; llvm-dev at lists.llvm.org
> > Subject: RE: [llvm-dev] RFC: Adding support for the z/OS platform
> to LLVM and
> > clang
> >
> > Tom Honermann <Thomas.Honermann at synopsys.com> wrote on 16.06.2020
> > 16:53:33:
> >
> > > > > > 2) Add patches to Clang to allow EBCDIC and ASCII (ISO-8859-1)
> > > > > > encoded
> > > >
> > > > > input source files. This would be done at the file open time to
> > allow
> > > > the
> > > > > rest of Clang to operate as if the source was UTF-8 and so
require
> > no
> > > > > changes downstream. Feedback on this plan is welcome from the
> > > > > Clang community.
> > > > > Would it be correct to assume that this EBCDIC -> UTF-8 mapping
> > would
> > > > > be as prescribed by UTF-EBCDIC / IBM CDRA, notably for the
control
> > > > > characters that do not map exactly?
> > > > > Notably, if the execution encoding is EBCDIC, is '0x06'
equivalent
> > to
> > > > > '0086', etc?
> > > > >
> > > > > The question "Is Unicode sufficient to represent all characters
> > > > > present in the input source without using the Private Use Area?"
> > > > > is one
> > > > that
> > > > > is relevant to both Clang and the C/C++ standard. ( I do hope
that
> > it
> > > > > is the case!)
> > > >
> > > > The current goal is to make only minimal changes to the frontend
to
> > enable
> > > > reading of EBCDIC encoded files. For this, we use the auto-
> > > conversion service of
> > > > z/OS UNIX System Services (
> > > >
> >
https://urldefense.com/v3/__https://www.ibm.com/support/knowledgecenter/
> > > >
> > SSLTBW_2.4.0/com.ibm.zos.v2r4.bpxb200/xpascii.htm__;!!A4F2R9G_pg!NKR
> > > > nU eS37wLNWpYN6Yvhm9SzZwujyMlnpbFJyHV5Z8-M6-
> > aucp0zxwXGxSZ7EKlr$
> > > > ), together with file tagging and setting the CCSID for the
program
> > and for
> > > > opened files.. The auto-conversion service supports round-trip
> > conversion
> > > > between EBCDIC and Enhanced ASCII. With it, boot strapping with
> > > > EBCDIC source files is possible.
> > > > Of course, more complete UTF-8 support is a valid implementation
> > > alternative.
> > >
> > > Other good references:
> > > - The 'ctag' utility
> > >
> > >
https://urldefense.com/v3/__https://www.ibm.com/support/knowledgecente
> > >
> > r/SSLTBW_2.3.0/__;!!A4F2R9G_pg!KV1im4SvVFKKMIvutwguN6maqCZttB7_zG_i
> > 0QW
> > > ZFauUVe6IKXYm6CeMjYXbWNyQ6SO-TOs$
> > > com.ibm.zos.v2r3.bpxa500/chtag.htm
> > > - File tagging overview
> > >
> > >
https://urldefense.com/v3/__https://www.ibm.com/support/knowledgecente
> > >
> > r/en/SSLTBW_2.3.0/__;!!A4F2R9G_pg!KV1im4SvVFKKMIvutwguN6maqCZttB7_z
> > G_i
> > > 0QWZFauUVe6IKXYm6CeMjYXbWNyQ2CwjL08$
> > > com.ibm.zos.v2r3.cbcpx01/cbc1p273.htm
> > >
> > > Kai, would use of auto conversion require that users set the
> > > _BPXK_AUTOCVT, _BPXK_CCSIDS, and/or _BPXK_PCCSID environment
> > > variables? Or do you envision having the clang driver set them
before
> > > invocation of the compiler? If the latter, that would imply that
> > > users (and tests) are responsible for setting them for direct 'clang
> > > -cc1' invocations.
> >
> > Hi Tom,
> > the current approach is to enable auto conversion only if
> _BPX_AUTOCVT is set
> > to ON. If the variable is not set, then all input files are
> treated as EBCDIC. The
> > rational behind is that we do not want to outsmart the user.
> > So there is no problem with direct `clang -cc1` invocations. It's
> a good hint that
> > we need to describe this setup somewhere.
>
> That seems reasonable. How would you handle _BPX_AUTOCVT being set to
ALL?
>
> (
> For anyone following along, the difference between ON and ALL is
described at
> https://www.ibm.com/support/knowledgecenter/SSLTBW_2.3.0/
> com.ibm.zos.v2r3.cbcpx01/setenv.htm#setenv:
> > When _BPXK_AUTOCVT is ON, automatic conversion can only take place
> between IBM-1047 and ISO8859-1 code sets. Other CCSID pairs are not
> supported for automatic text conversion. To request automatic
> conversion for any CCSID pairs that Unicode service supports, set
> _BPXK_AUTOCVT to ALL.
> )
>
> Tom.
>
That's a bit more complicated. For reading files, I can imagine the
following approach:
- the application is still using the ASCII execution mode (to link against
the ASCII version of the library)
- on each file handle, the program CCSID is set to UTF-8 (1208)
auto-conversion on the file is turned on if
- _BPX_AUTOCVT set to ALL
- file is untagged (assuming EBCDIC 1047) or file tag is not 1208
Writing text files would need a default encoding. Using UTF-8 (1208) would
makes sense.
This is really a "rough" first thought. I gave it a quick try, and it
failed. Most likely I overlooked something.
Best regards,
Kai Nacke
IT Architect
IBM Deutschland GmbH
Vorsitzender des Aufsichtsrats: Sebastian Krause
Geschäftsführung: Gregor Pillen (Vorsitzender), Agnes Heftberger, Norbert
Janzen, Markus Koerner, Christian Noll, Nicole Reimer
Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart,
HRB 14562 / WEEE-Reg.-Nr. DE 99369940
More information about the llvm-dev
mailing list