[llvm-dev] RFC: Adding support for the z/OS platform to LLVM and clang

Tom Honermann via llvm-dev llvm-dev at lists.llvm.org
Tue Jun 16 10:09:18 PDT 2020


> -----Original Message-----
> From: Kai Peter Nacke <kai.nacke at de.ibm.com>
> Sent: Tuesday, June 16, 2020 11:17 AM
> To: Tom Honermann <thonerma at synopsys.com>
> Cc: Corentin <corentin.jabot at gmail.com>; llvm-dev at lists.llvm.org
> Subject: RE: [llvm-dev] RFC: Adding support for the z/OS platform to LLVM and
> clang
> 
> Tom Honermann <Thomas.Honermann at synopsys.com> wrote on 16.06.2020
> 16:53:33:
> 
> > > > > 2) Add patches to Clang to allow EBCDIC and ASCII (ISO-8859-1)
> > > > > encoded
> > >
> > > > input source files. This would be done at the file open time to
> allow
> > > the
> > > > rest of Clang to operate as if the source was UTF-8 and so require
> no
> > > > changes downstream. Feedback on this plan is welcome from the
> > > > Clang community.
> > > > Would it be correct to assume that this EBCDIC -> UTF-8 mapping
> would
> > > > be as prescribed by UTF-EBCDIC / IBM CDRA, notably for the control
> > > > characters that do not map exactly?
> > > > Notably, if the execution encoding is EBCDIC, is '0x06' equivalent
> to
> > > > '0086', etc?
> > > >
> > > > The question "Is Unicode sufficient to represent all characters
> > > > present in the input source without using the Private Use Area?"
> > > > is one
> > > that
> > > > is relevant to both Clang and the C/C++ standard. ( I do hope that
> it
> > > > is the case!)
> > >
> > > The current goal is to make only minimal changes to the frontend to
> enable
> > > reading of EBCDIC encoded files. For this, we use the auto-
> > conversion service of
> > > z/OS UNIX System Services (
> > >
> https://urldefense.com/v3/__https://www.ibm.com/support/knowledgecenter/
> > >
> SSLTBW_2.4.0/com.ibm.zos.v2r4.bpxb200/xpascii.htm__;!!A4F2R9G_pg!NKR
> > > nU eS37wLNWpYN6Yvhm9SzZwujyMlnpbFJyHV5Z8-M6-
> aucp0zxwXGxSZ7EKlr$
> > > ), together with file tagging and setting the CCSID for the program
> and for
> > > opened files.. The auto-conversion service supports round-trip
> conversion
> > > between EBCDIC and Enhanced ASCII. With it, boot strapping with
> > > EBCDIC source files is possible.
> > > Of course, more complete UTF-8 support is a valid implementation
> > alternative.
> >
> > Other good references:
> > - The 'ctag' utility
> >
> > https://urldefense.com/v3/__https://www.ibm.com/support/knowledgecente
> >
> r/SSLTBW_2.3.0/__;!!A4F2R9G_pg!KV1im4SvVFKKMIvutwguN6maqCZttB7_zG_i
> 0QW
> > ZFauUVe6IKXYm6CeMjYXbWNyQ6SO-TOs$
> > com.ibm.zos.v2r3.bpxa500/chtag.htm
> > - File tagging overview
> >
> > https://urldefense.com/v3/__https://www.ibm.com/support/knowledgecente
> >
> r/en/SSLTBW_2.3.0/__;!!A4F2R9G_pg!KV1im4SvVFKKMIvutwguN6maqCZttB7_z
> G_i
> > 0QWZFauUVe6IKXYm6CeMjYXbWNyQ2CwjL08$
> > com.ibm.zos.v2r3.cbcpx01/cbc1p273.htm
> >
> > Kai, would use of auto conversion require that users set the
> > _BPXK_AUTOCVT, _BPXK_CCSIDS, and/or _BPXK_PCCSID environment
> > variables?  Or do you envision having the clang driver set them before
> > invocation of the compiler?  If the latter, that would imply that
> > users (and tests) are responsible for setting them for direct 'clang
> > -cc1' invocations.
> 
> Hi Tom,
> the current approach is to enable auto conversion only if _BPX_AUTOCVT is set
> to ON. If the variable is not set, then all input files are treated as EBCDIC. The
> rational behind is that we do not want to outsmart the user.
> So there is no problem with direct `clang -cc1` invocations. It's a good hint that
> we need to describe this setup somewhere.

That seems reasonable.  How would you handle _BPX_AUTOCVT being set to ALL?

(
For anyone following along, the difference between ON and ALL is described at https://www.ibm.com/support/knowledgecenter/SSLTBW_2.3.0/com.ibm.zos.v2r3.cbcpx01/setenv.htm#setenv:
> When _BPXK_AUTOCVT is ON, automatic conversion can only take place between IBM-1047 and ISO8859-1 code sets. Other CCSID pairs are not supported for automatic text conversion. To request automatic conversion for any CCSID pairs that Unicode service supports, set _BPXK_AUTOCVT to ALL.
)

Tom.



More information about the llvm-dev mailing list