[llvm] Create a CharSetConverter class with both iconv and icu support (PR #74516)
via llvm-commits
llvm-commits at lists.llvm.org
Thu Jan 4 08:11:55 PST 2024
================
@@ -0,0 +1,160 @@
+//===-- CharSet.h - Utility class to convert between char sets ----*- C++ -*-=//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+///
+/// \file
+/// This file provides a utility class to convert between different character
+/// set encodings.
+///
+//===----------------------------------------------------------------------===//
+
+#ifndef LLVM_SUPPORT_CHARSET_H
+#define LLVM_SUPPORT_CHARSET_H
+
+#include "llvm/ADT/SmallString.h"
+#include "llvm/ADT/StringRef.h"
+#include "llvm/Config/config.h"
+#include "llvm/Support/ErrorOr.h"
+
+#include <functional>
+#include <string>
+#include <system_error>
+
+namespace llvm {
+
+template <typename T> class SmallVectorImpl;
+
+namespace details {
+class CharSetConverterImplBase {
+public:
+ virtual ~CharSetConverterImplBase() = default;
+
+ /// Converts a string.
+ /// \param[in] Source source string
+ /// \param[in,out] Result container for converted string
+ /// \param[in] ShouldAutoFlush Append shift-back sequence after conversion
+ /// for multi-byte encodings iff true.
+ /// \return error code in case something went wrong
+ ///
+ /// The following error codes can occur, among others:
+ /// - std::errc::argument_list_too_long: The result requires more than
+ /// std::numeric_limits<size_t>::max() bytes.
+ /// - std::errc::illegal_byte_sequence: The input contains an invalid
+ /// multibyte sequence.
+ /// - std::errc::invalid_argument: The input contains an incomplete
+ /// multibyte sequence.
+ ///
+ /// In case of an error, the result string contains the successfully converted
+ /// part of the input string.
+ ///
+
+ virtual std::error_code convert(StringRef Source,
+ SmallVectorImpl<char> &Result,
+ bool ShouldAutoFlush) const = 0;
+
+ /// Restore the conversion to the original state.
+ /// \return error code in case something went wrong
+ ///
+ /// If the original character set or the destination character set
+ /// are multi-byte character sets, set the shift state to the initial
+ /// state. Otherwise this is a no-op.
+ virtual std::error_code flush() const = 0;
----------------
cor3ntin wrote:
Why does the source character has any impact here? (and again, multi-byte-> stateful everywhere).
`resetShiftState()` may also be a better name.
https://github.com/llvm/llvm-project/pull/74516
More information about the llvm-commits
mailing list