[llvm] Create a CharSetConverter class with both iconv and icu support (PR #74516)

Hubert Tong via llvm-commits llvm-commits at lists.llvm.org
Mon Apr 29 11:28:07 PDT 2024


================
@@ -0,0 +1,136 @@
+//===-- CharSet.h - Characters set conversion class ---------------*- C++ -*-=//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+///
+/// \file
+/// This file provides a utility class to convert between different character
+/// set encodings.
+///
+//===----------------------------------------------------------------------===//
+
+#ifndef LLVM_SUPPORT_CHARSET_H
+#define LLVM_SUPPORT_CHARSET_H
+
+#include "llvm/ADT/SmallString.h"
+#include "llvm/ADT/StringRef.h"
+#include "llvm/Config/config.h"
+#include "llvm/Support/ErrorOr.h"
+
+#include <functional>
+#include <string>
+#include <system_error>
+
+namespace llvm {
+
+template <typename T> class SmallVectorImpl;
+
+namespace details {
+class CharSetConverterImplBase {
+public:
+  virtual ~CharSetConverterImplBase() = default;
+
+  /// Converts a string.
+  /// \param[in] Source source string
+  /// \param[out] Result container for converted string
+  /// \param[in] ShouldAutoFlush Append shift-back sequence after conversion
+  /// for stateful encodings if true.
----------------
hubert-reinterpretcast wrote:

> where we do eventually want to properly support stateful encodings

The question comes down to whether we can move to such "proper" support when ecosystem concerns come into play. Once there is a C++ implementation on the system that we need to interoperate with, we need the length of the string literal to be consistent with that implementation.

GCC seems to generate the "extra" shift sequences.

https://github.com/llvm/llvm-project/pull/74516


More information about the llvm-commits mailing list