[libcxx-commits] [PATCH] D118851: [SystemZ]:[z/OS]:[libcxx]: fix isascii function to work for z/OS

Arthur O'Dwyer via Phabricator via libcxx-commits libcxx-commits at lists.llvm.org
Tue Feb 22 08:00:59 PST 2022


Quuxplusone added inline comments.


================
Comment at: libcxx/include/__locale:685
+#if defined(__MVS__)
+        return __c ? true : true;
+#else
----------------
NancyWang2222 wrote:
> ldionne wrote:
> > NancyWang2222 wrote:
> > > NancyWang2222 wrote:
> > > > Mordante wrote:
> > > > > Is this intended to always return true, if you please just return `true`. You can use a `(void) __c;` to silence compiler diagnostics.
> > > > clang will emit error if argument __c is not used in function. that is why I checked __c. 
> > > I will add void(__c); 
> > Why is this intended to always return true?
> isascii function is not right test for z/OS,  char type is between [0-255] which is valid character  indexing in __tab function on z/OS for both ASCII and EBCDIC. 
IIUC now (based on @SeanP's comment below), `__is_valid_in_encoding(x)` means no more or less than "`0 <= x && x < N`, where `N` is the number of elements in the array that was passed to the constructor on line 681." For most platforms this `N` is 128; for z/OS in particular, this `N` is 256.

Is this `N` always the same as `ctype::table_size` from lines 777–779? What goes wrong if we make this function simply
```
static bool __is_in_range(size_t __c) { return __c < table_size; }
```
? See
https://en.cppreference.com/w/cpp/locale/ctype_char/classic_table
https://en.cppreference.com/w/cpp/locale/ctype_char/table

Personally (although @ldionne is welcome to overrule me) I'd like to move away from the existing code's use of `isascii(x)` as a shorthand for `x < 128`, because this bounds check doesn't really have anything to do with ASCII, and it's silly to have to read a man page just to find out it means `x < 128`. If we mean `x < table_size`, we should just say that.


================
Comment at: libcxx/src/locale.cpp:1029
+      *low =
+          __is_valid_in_encoding(*low) ? static_cast<char>(__classic_upper_table()[static_cast<size_t>(*low)]) : *low;
 #else
----------------
SeanP wrote:
> Quuxplusone wrote:
> > Changes like this one make me nervous. The original purpose of `isascii(x)` seems to have been a bounds check: it says "yes, `x` is in the range 0 to 127 inclusive." That doesn't really have anything to do with the ASCII versus EBCDIC question — hence the mass renaming to `__is_valid_in_encoding`, on which I'm neutral. However, it is //very important// to all of these table lookups: If `x` isn't in the range 0 to 127, e.g. if x is `-1` or `10000`, then these lookups will have UB.
> > 
> > Can you explain why you want to make this change? Specifically, what behaves wrongly on your platform if we //don't do any of this?//
> The tables on z/OS have the range 0-255.  This is true for both ASCII and EBCDIC.  The tables need to be 0-255 since EBCDIC characters are spread over this entire range (eg. '9' is 0xf9).  The system uses the same range in for the ASCII tables which is why the code uses 0-255 for z/OS.
Ah, thanks. I've resumed the comment thread on `__is_valid_in_encoding` above.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D118851/new/

https://reviews.llvm.org/D118851



More information about the libcxx-commits mailing list