[llvm-dev] RFC: Adding support for the z/OS platform to LLVM and clang
Kai Peter Nacke via llvm-dev
llvm-dev at lists.llvm.org
Tue Jun 16 08:17:02 PDT 2020
Tom Honermann <Thomas.Honermann at synopsys.com> wrote on 16.06.2020
16:53:33:
> > > > 2) Add patches to Clang to allow EBCDIC and ASCII (ISO-8859-1)
> > > > encoded
> >
> > > input source files. This would be done at the file open time to
allow
> > the
> > > rest of Clang to operate as if the source was UTF-8 and so require
no
> > > changes downstream. Feedback on this plan is welcome from the Clang
> > > community.
> > > Would it be correct to assume that this EBCDIC -> UTF-8 mapping
would
> > > be as prescribed by UTF-EBCDIC / IBM CDRA, notably for the control
> > > characters that do not map exactly?
> > > Notably, if the execution encoding is EBCDIC, is '0x06' equivalent
to
> > > '0086', etc?
> > >
> > > The question "Is Unicode sufficient to represent all characters
> > > present in the input source without using the Private Use Area?" is
> > > one
> > that
> > > is relevant to both Clang and the C/C++ standard. ( I do hope that
it
> > > is the case!)
> >
> > The current goal is to make only minimal changes to the frontend to
enable
> > reading of EBCDIC encoded files. For this, we use the auto-
> conversion service of
> > z/OS UNIX System Services (
> >
https://urldefense.com/v3/__https://www.ibm.com/support/knowledgecenter/
> > SSLTBW_2.4.0/com.ibm.zos.v2r4.bpxb200/xpascii.htm__;!!A4F2R9G_pg!NKRnU
> > eS37wLNWpYN6Yvhm9SzZwujyMlnpbFJyHV5Z8-M6-aucp0zxwXGxSZ7EKlr$
> > ), together with file tagging and setting the CCSID for the program
and for
> > opened files.. The auto-conversion service supports round-trip
conversion
> > between EBCDIC and Enhanced ASCII. With it, boot strapping with EBCDIC
> > source files is possible.
> > Of course, more complete UTF-8 support is a valid implementation
> alternative.
>
> Other good references:
> - The 'ctag' utility
> https://www.ibm.com/support/knowledgecenter/SSLTBW_2.3.0/
> com.ibm.zos.v2r3.bpxa500/chtag.htm
> - File tagging overview
> https://www.ibm.com/support/knowledgecenter/en/SSLTBW_2.3.0/
> com.ibm.zos.v2r3.cbcpx01/cbc1p273.htm
>
> Kai, would use of auto conversion require that users set the
> _BPXK_AUTOCVT, _BPXK_CCSIDS, and/or _BPXK_PCCSID environment
> variables? Or do you envision having the clang driver set them
> before invocation of the compiler? If the latter, that would imply
> that users (and tests) are responsible for setting them for direct
> 'clang -cc1' invocations.
Hi Tom,
the current approach is to enable auto conversion only if _BPX_AUTOCVT is
set to ON. If the variable is not set, then all input files are treated as
EBCDIC. The rational behind is that we do not want to outsmart the user.
So there is no problem with direct `clang -cc1` invocations. It's a good
hint that we need to describe this setup somewhere.
> Here is another possible direction to consider that would provide a
> more portable facility. Clang has interfaces for overriding file
> contents with a memory buffer; see the overrideFileContents()
> overloads in SourceManager. It should be straight forward to, when
> loading a file, make a determination as to whether a conversion is
> needed (e.g., consider file tags, environment variables, command
> line options, etc...) and, if needed, transcode the file contents
> and register the resulting buffer as an override. This would be
> useful for implementation of -finput-charset and would benefit
> deployments in Microsoft environments that have source files in
> ISO-8859 encodings.
That's a good hint. I'll definitely have a look at it, as it sounds that
it could solve some problems/complexity. A separate solution would then
still be required for LLVM.
> Tom.
Best regards,
Kai Nacke
IT Architect
IBM Deutschland GmbH
Vorsitzender des Aufsichtsrats: Sebastian Krause
Geschäftsführung: Gregor Pillen (Vorsitzender), Agnes Heftberger, Norbert
Janzen, Markus Koerner, Christian Noll, Nicole Reimer
Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart,
HRB 14562 / WEEE-Reg.-Nr. DE 99369940
More information about the llvm-dev
mailing list