[PATCH] D88741: [SystemZ/z/OS] Add utility class for char set conversion.

Corentin Jabot via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Thu Mar 30 01:27:15 PDT 2023


cor3ntin added subscribers: tstellar, aaron.ballman.
cor3ntin added a comment.

Thanks for working on this.

Before starting a more in depth review on that, I think this is big enough that we want to see an RFC with wider consensus and interest from the community as far as maintaining this.
I know of https://discourse.llvm.org/t/rfc-adding-a-char-set-converter-to-support-library/56592/2 and https://discourse.llvm.org/t/rfc-adding-a-char-set-converter-to-support-library/56574 , but neither gather a lot of comments.

I would like to better understand the set of circumstances in which  the system iconv's IBM1047 transcoder would not be suitable for use by llvm, on the systems that make use of that encoding. 
Maintaining these tables won't be free. Are the discrepancies regarding LF handling something you expect to cause issues in practice?

I am of the opinion that we should rely on iconv as much as possible, as i do not think maintaining conversion table will be a good use of our resources, however i think people might have widely different opinions and you will want 
to make sure there is community buy-in on that point.

We should note that GCC has had good success using iconv, and I don't think the platform-specific availability of encondings has been an issue in practice but it's worth addressing. I know some people mentioned that point previously.
IMO, Any user confusion in that regard could be addressed by providing a sufficiently high-quality diagnostic in the presence of unsupported encoding.

If we do use iconv though, i would like us to have a better understanding of use cases, The patch currently links `iconv` to all llvm libraries, which might be overkill if the only project using it is Clang, and I wonder how that affects packaging 
on linux distributions.

Speaking of, maybe the LLVM foundation will be able to provide legal opinions on linking to `iconv` on various platforms. Maybe something @tstellar  would be able to help with. I would not expect particular challenges but we
should to make sure.

@aaron.ballman  also reminded me that if `iconv` is generally available on posix-ish platforms, it is usually not available on Windows. Building iconv-like facilities on top of win32 APIs may be challenging, but we might want to think about
ways to allow or facilitate the use of iconv in Windows build. At least i would like us to give some consideration to that point, Indeed, Windows would, logically be the other big motivation to teach Clang about encodings beyond UTF-8.

Are there any other supported platforms for which iconv availability would be a concern?

I think all of these points should be addressed in an RFC so that we have a clear plan moving forward.

On a slightly more technical aspects, I'm concerned about the attempt at providing genericity  in the `CharSetConverterTable` class - as Hubert alluded to, a table of bytes only works for specific, stateless, single byte encoding that have a reasonable mapping to one another.
If we do agree that we want a homegrown IBM1047<->UTF-8 conversion class, maybe we should have a class that does only that instead of trying to future proof and expect to add more tables. As i hope that we won't try to support more encoding without iconv.

`CharSetConverterIdentity` will never be a very efficient thing to use as implemented as it will perform a copy that may not be be useful. Have you consider either asserting that the encoding are distincts, or providing an error in that case instead of making a no-op copying converter?



================
Comment at: llvm/include/llvm/Support/CharSet.h:93
+class CharSetConverter {
+  details::CharSetConverterImplBase *Converter;
+
----------------
Please use `std::unique_ptr` here, that way you won't have to manually manage memory in the destructor and move assignment operator


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D88741/new/

https://reviews.llvm.org/D88741



More information about the llvm-commits mailing list