[flang-commits] [PATCH] D123711: [flang] Always encode multi-byte output in UTF-8
Peter Klausler via Phabricator via flang-commits
flang-commits at lists.llvm.org
Wed Apr 13 12:35:14 PDT 2022
klausler created this revision.
klausler added a reviewer: jeanPerier.
klausler added a project: Flang.
Herald added a subscriber: jdoerfert.
Herald added a project: All.
klausler requested review of this revision.
A recent change to implement UTF-8 encoding should have
made the encoding conditional only for CHARACTER(KIND=1)
to enable UTF-8 output vs. Latin-1 or whatever. UTF-8 output
of wider CHARACTER kinds should not be conditional (until we choose
to support UCS-16, maybe). So wider CHARACTER kinds are being
emitted with extra zero bytes; this patch fixes them.
https://reviews.llvm.org/D123711
Files:
flang/runtime/connection.h
flang/runtime/edit-output.cpp
flang/runtime/io-stmt.cpp
Index: flang/runtime/io-stmt.cpp
===================================================================
--- flang/runtime/io-stmt.cpp
+++ flang/runtime/io-stmt.cpp
@@ -485,7 +485,7 @@
// Don't allow sign extension
using UnsignedChar = std::make_unsigned_t<CHAR>;
const UnsignedChar *data{reinterpret_cast<const UnsignedChar *>(data0)};
- if (GetConnectionState().isUTF8) {
+ if (GetConnectionState().useUTF8<CHAR>()) {
char buffer[256];
std::size_t at{0};
while (chars-- > 0) {
Index: flang/runtime/edit-output.cpp
===================================================================
--- flang/runtime/edit-output.cpp
+++ flang/runtime/edit-output.cpp
@@ -506,7 +506,7 @@
// Undelimited list-directed output
ok = ok && list.EmitLeadingSpaceOrAdvance(io, length > 0 ? 1 : 0, true);
std::size_t put{0};
- std::size_t oneIfUTF8{connection.isUTF8 ? 1 : length};
+ std::size_t oneIfUTF8{connection.useUTF8<CHAR>() ? 1 : length};
while (ok && put < length) {
if (std::size_t chunk{std::min<std::size_t>(
std::min<std::size_t>(length - put, oneIfUTF8),
Index: flang/runtime/connection.h
===================================================================
--- flang/runtime/connection.h
+++ flang/runtime/connection.h
@@ -34,6 +34,13 @@
// Formatted stream files are viewed as having records, at least on input
return access != Access::Stream || !isUnformatted.value_or(true);
}
+
+ template <typename CHAR = char> constexpr bool useUTF8() const {
+ // For wide CHARACTER kinds, always use UTF-8 for formatted I/O.
+ // For single-byte CHARACTER, encode characters >= 0x80 with
+ // UTF-8 iff the mode is set.
+ return sizeof(CHAR) > 1 || isUTF8;
+ }
};
struct ConnectionState : public ConnectionAttributes {
-------------- next part --------------
A non-text attachment was scrubbed...
Name: D123711.422601.patch
Type: text/x-patch
Size: 1802 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/flang-commits/attachments/20220413/201934d1/attachment-0001.bin>
More information about the flang-commits
mailing list