[llvm-dev] RFC: Adding support for the z/OS platform to LLVM and clang

Thu Jun 11 13:53:14 PDT 2020

On Thu, Jun 11, 2020 at 12:07 PM Kai Peter Nacke <kai.nacke at de.ibm.com>
wrote:

> Hubert Tong <hubert.reinterpretcast at gmail.com> wrote on 10.06.2020
> 23:51:54:
>
> > From: Hubert Tong <hubert.reinterpretcast at gmail.com>
> > To: Kai Peter Nacke <kai.nacke at de.ibm.com>
> > Cc: llvm-dev <llvm-dev at lists.llvm.org>
> > Date: 10.06.2020 23:52
> > Subject: [EXTERNAL] Re: [llvm-dev] RFC: Adding support for the z/OS
> > platform to LLVM and clang
> >
> > On Wed, Jun 10, 2020 at 3:11 PM Kai Peter Nacke via llvm-dev <llvm-
> > dev at lists.llvm.org> wrote:
> > 2) Add patches to Clang to allow EBCDIC and ASCII (ISO-8859-1) encoded
> > input source files. This would be done at the file open time to allow
> the
> > rest of Clang to operate as if the source was UTF-8 and so require no
> > changes downstream. Feedback on this plan is welcome from the Clang
> > community.
> > Is there a statement that can be made with respect to accepting
> > UTF-8 encoded source files in a z/OS hosted environment or is it
> > implied that it works with no changes (and there are no changes that
> > will break this functionality)?
> >
> > Also, would these changes enable the consumption of non-UTF-8
> > encoded source files on Clang as hosted on other platforms?
>
> The intention is to use the auto-conversion feature from the
> language environment. Currently, this platform feature does not
> handle conversions of multi-byte encodings, so at this time
> consumption of UTF-8 encoded source files is not possible.
>
If the internal representation is still UTF-8, consuming UTF-8 should
involve not converting. It is sounding like the internal representation has
been changed to ISO-8859-1 in order to support characters outside those in
US-ASCII. If it is indeed internally fixed to ISO-8859-1, then the question
of future support for non-Latin (e.g., Greek or Cyrillic) scripts arises.
It may be a better tradeoff to leave the internal representation as UTF-8
and restrict the support to the US-ASCII subset for now.

> For the same reason, this does not enable the consumption of
> non-UTF-8 encoded source files on other platforms.
>

Thanks Kai for clarifying. I think this direction leads to some questions
around testing.

The auto-conversion feature makes use of some filesystem-specific features
such as filetags that indicate the associated coded character set. In terms
of the testing environment on a z/OS system under USS, will there be
documentation or scripts available for establishing the necessary file
properties on the local tree? It also sounds like there would be some tests
that are specific to z/OS-hosted builds that test the conversion facilities.

Also, if the platform feature does not handle conversions of multi-byte
encodings, I am wondering if alternative mechanisms (such as iconv) have
been investigated. I suppose there is an issue over how source positions
are determined; however, I do not see how an extension of the
autoconversion facility would avoid the said issue.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200611/398a16cf/attachment.html>