[llvm-dev] RFC: Adding support for the z/OS platform to LLVM and clang
Tom Honermann via llvm-dev
llvm-dev at lists.llvm.org
Tue Jun 16 07:53:33 PDT 2020
> -----Original Message-----
> From: llvm-dev <llvm-dev-bounces at lists.llvm.org> On Behalf Of Kai Peter Nacke
> via llvm-dev
> Sent: Tuesday, June 16, 2020 8:51 AM
> To: Corentin <corentin.jabot at gmail.com>
> Cc: llvm-dev at lists.llvm.org
> Subject: Re: [llvm-dev] RFC: Adding support for the z/OS platform to LLVM and
> clang
>
> > > 2) Add patches to Clang to allow EBCDIC and ASCII (ISO-8859-1)
> > > encoded
>
> > input source files. This would be done at the file open time to allow
> the
> > rest of Clang to operate as if the source was UTF-8 and so require no
> > changes downstream. Feedback on this plan is welcome from the Clang
> > community.
> > Would it be correct to assume that this EBCDIC -> UTF-8 mapping would
> > be as prescribed by UTF-EBCDIC / IBM CDRA, notably for the control
> > characters that do not map exactly?
> > Notably, if the execution encoding is EBCDIC, is '0x06' equivalent to
> > '0086', etc?
> >
> > The question "Is Unicode sufficient to represent all characters
> > present in the input source without using the Private Use Area?" is
> > one
> that
> > is relevant to both Clang and the C/C++ standard. ( I do hope that it
> > is the case!)
>
> The current goal is to make only minimal changes to the frontend to enable
> reading of EBCDIC encoded files. For this, we use the auto-conversion service of
> z/OS UNIX System Services (
> https://urldefense.com/v3/__https://www.ibm.com/support/knowledgecenter/
> SSLTBW_2.4.0/com.ibm.zos.v2r4.bpxb200/xpascii.htm__;!!A4F2R9G_pg!NKRnU
> eS37wLNWpYN6Yvhm9SzZwujyMlnpbFJyHV5Z8-M6-aucp0zxwXGxSZ7EKlr$
> ), together with file tagging and setting the CCSID for the program and for
> opened files.. The auto-conversion service supports round-trip conversion
> between EBCDIC and Enhanced ASCII. With it, boot strapping with EBCDIC
> source files is possible.
> Of course, more complete UTF-8 support is a valid implementation alternative.
Other good references:
- The 'ctag' utility
https://www.ibm.com/support/knowledgecenter/SSLTBW_2.3.0/com.ibm.zos.v2r3.bpxa500/chtag.htm
- File tagging overview
https://www.ibm.com/support/knowledgecenter/en/SSLTBW_2.3.0/com.ibm.zos.v2r3.cbcpx01/cbc1p273.htm
Kai, would use of auto conversion require that users set the _BPXK_AUTOCVT, _BPXK_CCSIDS, and/or _BPXK_PCCSID environment variables? Or do you envision having the clang driver set them before invocation of the compiler? If the latter, that would imply that users (and tests) are responsible for setting them for direct 'clang -cc1' invocations.
Here is another possible direction to consider that would provide a more portable facility. Clang has interfaces for overriding file contents with a memory buffer; see the overrideFileContents() overloads in SourceManager. It should be straight forward to, when loading a file, make a determination as to whether a conversion is needed (e.g., consider file tags, environment variables, command line options, etc...) and, if needed, transcode the file contents and register the resulting buffer as an override. This would be useful for implementation of -finput-charset and would benefit deployments in Microsoft environments that have source files in ISO-8859 encodings.
Tom.
More information about the llvm-dev
mailing list