[libcxx-commits] [PATCH] D111323: [SystemZ][z/OS] ASCII/EBCDIC support for libc++

Thu Nov 11 16:13:31 PST 2021

ldionne added a comment.

Thanks for the detailed design document, this was really helpful in understanding what this patch is about.

However, I share @Quuxplusone 's concerns. This is adding a lot of complexity, especially the part about deciding whether a function should be in an encoding-dependent namespace or not. This means that whenever we add new code, we'll need to ask ourselves the question "oh, should this be in a CES namespace or not"? This is on the same level of annoyance as the `_LIBCPP_INLINE_VISIBILITY` macro we have everywhere (but wait, not quite **everywhere**, which is why it's so evil), which is an enormous mess and a barrier to contributions, a source of bugs, and a source of confusion all the time. I am concerned by the approach taken here, as I feel it's adding just another `_LIBCPP_INLINE_VISIBILITY`.

I suspect there are alternate solutions to supporting multiple encodings without mangling the library like this. For starters, it seems like the namespacing is primarily aimed at reducing code duplication when you link two libc++s built with different character encodings, instead of using the existing knob for tweaking the namespace, i.e. defining `_LIBCPP_ABI_NAMESPACE`. I believe you could build the whole library twice, once with `_LIBCPP_ABI_NAMESPACE=__ascii` and once with `_LIBCPP_ABI_NAMESPACE=__ebcdic` instead. You'd end up with more code duplication, however I believe it would be vastly more beneficial and general to solve this problem by working on enabling better code de-duplication at the optimizer level. I know there has been a lot of work on outlining in LLVM, for example, perhaps that could be investigated? IOW: I think libc++ already gives you the knob you need for the namespace changes you're making here -- they might not be granular enough, however mangling the library isn't the right way to get what you need.

Regarding the string literals, I don't have as big an issue with those -- it looks roughly like an attempt to localize those error messages, which is not convenient but at least it is systematic. However, I don't understand why the compiler isn't performing those translations itself, since it knows which encoding we're compiling for.

To make progress on this:

1. Have you tried using `_LIBCPP_ABI_NAMESPACE`?
2. Is it possible for the compiler to perform the encoding translation for string literals? It would not only be simpler, but it would also solve issues more generally for your users who want to use alternate encodings, not only in libc++/libc++abi.

TL;DR: These changes are quite intrusive and mangling the library like this doesn't strike me as the right approach, especially since I think we already have general knobs to solve some of these issues. It adds a lot of complexity and room for mistakes for the benefit of a single platform. I would like to investigate alternate ways to solve your problems, with some possible avenues to explore laid out above.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D111323/new/

https://reviews.llvm.org/D111323