[llvm-dev] RFC: Adding support for the z/OS platform to LLVM and clang

Kai Peter Nacke via llvm-dev llvm-dev at lists.llvm.org
Tue Jun 16 08:17:02 PDT 2020


Tom Honermann <Thomas.Honermann at synopsys.com> wrote on 16.06.2020 
16:53:33:

> > > > 2) Add patches to Clang to allow EBCDIC and ASCII (ISO-8859-1)
> > > > encoded
> > 
> > > input source files. This would be done at the file open time to 
allow
> > the
> > > rest of Clang to operate as if the source was UTF-8 and so require 
no
> > > changes downstream. Feedback on this plan is welcome from the Clang
> > > community.
> > > Would it be correct to assume that this EBCDIC -> UTF-8 mapping 
would
> > > be as prescribed by UTF-EBCDIC / IBM CDRA, notably for the control
> > > characters that do not map exactly?
> > > Notably, if the execution encoding is EBCDIC, is '0x06' equivalent 
to
> > > '0086', etc?
> > >
> > > The question "Is Unicode sufficient to represent all characters
> > > present in the input source without using the Private Use Area?" is
> > > one
> > that
> > > is relevant to both Clang and the C/C++ standard. ( I do hope that 
it
> > > is the case!)
> > 
> > The current goal is to make only minimal changes to the frontend to 
enable
> > reading of EBCDIC encoded files. For this, we use the auto-
> conversion service of
> > z/OS UNIX System Services (
> > 
https://urldefense.com/v3/__https://www.ibm.com/support/knowledgecenter/
> > SSLTBW_2.4.0/com.ibm.zos.v2r4.bpxb200/xpascii.htm__;!!A4F2R9G_pg!NKRnU
> > eS37wLNWpYN6Yvhm9SzZwujyMlnpbFJyHV5Z8-M6-aucp0zxwXGxSZ7EKlr$
> > ), together with file tagging and setting the CCSID for the program 
and for
> > opened files.. The auto-conversion service supports round-trip 
conversion
> > between EBCDIC and Enhanced ASCII. With it, boot strapping with EBCDIC
> > source files is possible.
> > Of course, more complete UTF-8 support is a valid implementation 
> alternative.
> 
> Other good references:
> - The 'ctag' utility
>   https://www.ibm.com/support/knowledgecenter/SSLTBW_2.3.0/
> com.ibm.zos.v2r3.bpxa500/chtag.htm
> - File tagging overview
>   https://www.ibm.com/support/knowledgecenter/en/SSLTBW_2.3.0/
> com.ibm.zos.v2r3.cbcpx01/cbc1p273.htm
> 
> Kai, would use of auto conversion require that users set the 
> _BPXK_AUTOCVT, _BPXK_CCSIDS, and/or _BPXK_PCCSID environment 
> variables?  Or do you envision having the clang driver set them 
> before invocation of the compiler?  If the latter, that would imply 
> that users (and tests) are responsible for setting them for direct 
> 'clang -cc1' invocations.

Hi Tom,
the current approach is to enable auto conversion only if _BPX_AUTOCVT is 
set to ON. If the variable is not set, then all input files are treated as 
EBCDIC. The rational behind is that we do not want to outsmart the user. 
So there is no problem with direct `clang -cc1` invocations. It's a good 
hint that we need to describe this setup somewhere.
 
> Here is another possible direction to consider that would provide a 
> more portable facility.  Clang has interfaces for overriding file 
> contents with a memory buffer; see the overrideFileContents() 
> overloads in SourceManager.  It should be straight forward to, when 
> loading a file, make a determination as to whether a conversion is 
> needed (e.g., consider file tags, environment variables, command 
> line options, etc...) and, if needed, transcode the file contents 
> and register the resulting buffer as an override.  This would be 
> useful for implementation of -finput-charset and would benefit 
> deployments in Microsoft environments that have source files in 
> ISO-8859 encodings.

That's a good hint. I'll definitely have a look at it, as it sounds that 
it could solve some problems/complexity. A separate solution would then 
still be required for LLVM.
 
> Tom. 

Best regards,
Kai Nacke
IT Architect

IBM Deutschland GmbH
Vorsitzender des Aufsichtsrats: Sebastian Krause
Geschäftsführung: Gregor Pillen (Vorsitzender), Agnes Heftberger, Norbert 
Janzen, Markus Koerner, Christian Noll, Nicole Reimer
Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, 
HRB 14562 / WEEE-Reg.-Nr. DE 99369940



More information about the llvm-dev mailing list