[PATCH] D88741: [SystemZ/z/OS] Add utility class for char set conversion.

Tue Oct 19 07:59:30 PDT 2021

Kai added inline comments.

================
Comment at: llvm/include/llvm/Support/CharSet.h:32
+namespace text_encoding {
+enum class id {
+  /// UTF-8 character set encoding.
----------------
hubert.reinterpretcast wrote:
> ZarkoCA wrote:
> > I think enums should start with an uppercase letter as per the style guide.
> Thanks @ZarkoCA. The coding standards don't state this, but the existing practice is that the exception (from the coding standards) about following the C++ STL naming convention in some cases is more broadly applicable for components with a corresponding standard (or proposed standard) facility like this one.
To be precise, the exception is documented here: https://llvm.org/docs/CodingStandards.html#name-types-functions-variables-and-enumerators-properly (at the end of the paragraph).

================
Comment at: llvm/lib/Support/CharSet.cpp:179-183
+  return CharSetConverter{
+      [Table, Flags](StringRef Source, SmallVectorImpl<char> &Result) {
+        return convertWithTable(Table, Flags, Source, Result);
+      },
+      nullptr};
----------------
hubert.reinterpretcast wrote:
> Because of the [U+0000, U+00FF] limitation of the `convertWithTable` converter, we should not be getting here if there is a no-op conversion from UTF-8 to UTF-8. Either a no-op converter should be returned, or a request for such a converter is erroneous (`report_fatal_error` here would be friendlier by failing fast).
Added a no-op converter.

================
Comment at: llvm/lib/Support/CharSet.cpp:104
+        if (Ch != 0xc2 && Ch != 0xc3)
+          return std::make_error_code(std::errc::illegal_byte_sequence);
+        // Is buffer truncated?
----------------
hubert.reinterpretcast wrote:
> The API contract between the table conversion and the `iconv` conversion is inconsistent. `iconv` conversion performs an implementation-defined conversion for valid input characters that does not have a representation in the output codeset. This implementation fails the conversion instead.
Changed implementation to have same behavior as iconv.

================
Comment at: llvm/lib/Support/CharSet.cpp:140
+  size_t Capacity = Result.capacity();
+  Result.resize(Capacity);
+  char *Output = static_cast<char *>(Result.data());
----------------
hubert.reinterpretcast wrote:
> This looks like a use case for `resize_for_overwrite`.
Changed, thanks.

================
Comment at: llvm/lib/Support/CharSet.cpp:152
+      Capacity *= 2;
+      Result.resize(Capacity);
+      Output = static_cast<char *>(Result.data()) + Used;
----------------
hubert.reinterpretcast wrote:
> Same comment about `resize_for_overwrite`.
Changed, thanks.

================
Comment at: llvm/lib/Support/CharSet.cpp:194
+  iconv_t ConvDesc = iconv_open(CSTo.str().c_str(), CSFrom.str().c_str());
+  if (ConvDesc == reinterpret_cast<iconv_t>(-1))
+    return std::error_code(errno, std::generic_category());
----------------
hubert.reinterpretcast wrote:
> Use a C-style cast if necessary, but `reinterpret_cast` is wrong here.
Changed.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D88741/new/

https://reviews.llvm.org/D88741