[flang-commits] [flang] 664c111 - [flang] Always encode multi-byte output in UTF-8

Peter Klausler via flang-commits flang-commits at lists.llvm.org
Thu Apr 14 11:14:00 PDT 2022


Author: Peter Klausler
Date: 2022-04-14T11:13:51-07:00
New Revision: 664c111c958c14e3250fe9e82ba16de05fb4772f

URL: https://github.com/llvm/llvm-project/commit/664c111c958c14e3250fe9e82ba16de05fb4772f
DIFF: https://github.com/llvm/llvm-project/commit/664c111c958c14e3250fe9e82ba16de05fb4772f.diff

LOG: [flang] Always encode multi-byte output in UTF-8

A recent change to implement UTF-8 encoding should have
made the encoding conditional only for CHARACTER(KIND=1)
to enable UTF-8 output vs. Latin-1 or whatever.  UTF-8 output
of wider CHARACTER kinds should not be conditional (until we choose
to support UCS-16, maybe).  So wider CHARACTER kinds are being
emitted with extra zero bytes; this patch fixes them.

Differential Revision: https://reviews.llvm.org/D123711

Added: 
    

Modified: 
    flang/runtime/connection.h
    flang/runtime/edit-output.cpp
    flang/runtime/io-stmt.cpp

Removed: 
    


################################################################################
diff  --git a/flang/runtime/connection.h b/flang/runtime/connection.h
index c86b6947dbedc..09b84255b9226 100644
--- a/flang/runtime/connection.h
+++ b/flang/runtime/connection.h
@@ -34,6 +34,13 @@ struct ConnectionAttributes {
     // Formatted stream files are viewed as having records, at least on input
     return access != Access::Stream || !isUnformatted.value_or(true);
   }
+
+  template <typename CHAR = char> constexpr bool useUTF8() const {
+    // For wide CHARACTER kinds, always use UTF-8 for formatted I/O.
+    // For single-byte CHARACTER, encode characters >= 0x80 with
+    // UTF-8 iff the mode is set.
+    return sizeof(CHAR) > 1 || isUTF8;
+  }
 };
 
 struct ConnectionState : public ConnectionAttributes {

diff  --git a/flang/runtime/edit-output.cpp b/flang/runtime/edit-output.cpp
index 46d3752262587..824747ed17881 100644
--- a/flang/runtime/edit-output.cpp
+++ b/flang/runtime/edit-output.cpp
@@ -506,7 +506,7 @@ bool ListDirectedCharacterOutput(IoStatementState &io,
     // Undelimited list-directed output
     ok = ok && list.EmitLeadingSpaceOrAdvance(io, length > 0 ? 1 : 0, true);
     std::size_t put{0};
-    std::size_t oneIfUTF8{connection.isUTF8 ? 1 : length};
+    std::size_t oneIfUTF8{connection.useUTF8<CHAR>() ? 1 : length};
     while (ok && put < length) {
       if (std::size_t chunk{std::min<std::size_t>(
               std::min<std::size_t>(length - put, oneIfUTF8),

diff  --git a/flang/runtime/io-stmt.cpp b/flang/runtime/io-stmt.cpp
index caa7d29dc9222..fba77171062ef 100644
--- a/flang/runtime/io-stmt.cpp
+++ b/flang/runtime/io-stmt.cpp
@@ -477,7 +477,7 @@ bool IoStatementState::EmitEncoded(const CHAR *data0, std::size_t chars) {
   // Don't allow sign extension
   using UnsignedChar = std::make_unsigned_t<CHAR>;
   const UnsignedChar *data{reinterpret_cast<const UnsignedChar *>(data0)};
-  if (GetConnectionState().isUTF8) {
+  if (GetConnectionState().useUTF8<CHAR>()) {
     char buffer[256];
     std::size_t at{0};
     while (chars-- > 0) {


        


More information about the flang-commits mailing list