[llvm-dev] RFC: Adding support for the z/OS platform to LLVM and clang

Kai Peter Nacke via llvm-dev llvm-dev at lists.llvm.org
Tue Jun 16 06:16:42 PDT 2020


Hubert Tong <hubert.reinterpretcast at gmail.com> wrote on 11.06.2020 
22:53:14:

> The intention is to use the auto-conversion feature from the
> language environment. Currently, this platform feature does not
> handle conversions of multi-byte encodings, so at this time
> consumption of UTF-8 encoded source files is not possible.
> If the internal representation is still UTF-8, consuming UTF-8 
> should involve not converting. It is sounding like the internal 
> representation has been changed to ISO-8859-1 in order to support 
> characters outside those in US-ASCII. If it is indeed internally 
> fixed to ISO-8859-1, then the question of future support for non-
> Latin (e.g., Greek or Cyrillic) scripts arises. It may be a better 
> tradeoff to leave the internal representation as UTF-8 and restrict 
> the support to the US-ASCII subset for now.

The intention is to initially restrict the support to the US-ASCII subset. 
This enables compiling with EBCDIC-encoding files and does not exclude 
further development for true UTF-8 support.
  
> For the same reason, this does not enable the consumption of
> non-UTF-8 encoded source files on other platforms.

Yes, because a platform-specific feature is used, it does not enable 
reading of non-UTF-8 encoded files on other platforms.

> Thanks Kai for clarifying. I think this direction leads to some 
> questions around testing.
> 
> The auto-conversion feature makes use of some filesystem-specific 
> features such as filetags that indicate the associated coded 
> character set. In terms of the testing environment on a z/OS system 
> under USS, will there be documentation or scripts available for 
> establishing the necessary file properties on the local tree? It 
> also sounds like there would be some tests that are specific to z/
> OS-hosted builds that test the conversion facilities.

With a git clone under z/OS USS, the files get automatically tagged as 
Latin-1, requiring no further setup.
We also have some tests which tests the text conversion. Of course, this 
only runs on z/OS USS due to the use of the conversion service.
 
> Also, if the platform feature does not handle conversions of multi-
> byte encodings, I am wondering if alternative mechanisms (such as 
> iconv) have been investigated. I suppose there is an issue over how 
> source positions are determined; however, I do not see how an 
> extension of the autoconversion facility would avoid the said issue.
We have not yet investigated alternative mechanisms for converting file 
data. The first striking complexity is where to do the conversion. With 
the source locations identified, other conversion approaches are 
imaginable. Of course, converting on the fly poses some challenges, like 
the one you mentioned.

Best regards,
Kai Nacke
IT Architect

IBM Deutschland GmbH
Vorsitzender des Aufsichtsrats: Sebastian Krause
Geschäftsführung: Gregor Pillen (Vorsitzender), Agnes Heftberger, Norbert 
Janzen, Markus Koerner, Christian Noll, Nicole Reimer
Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, 
HRB 14562 / WEEE-Reg.-Nr. DE 99369940



More information about the llvm-dev mailing list