[flang-commits] [flang] 4d42e16 - [flang] runtime: fix problems with I/O around EOF & delimited characters

peter klausler via flang-commits flang-commits at lists.llvm.org
Fri Jul 23 18:48:02 PDT 2021


Author: peter klausler
Date: 2021-07-23T18:23:26-07:00
New Revision: 4d42e16eb8f7bb1484a1970324aa3391809188de

URL: https://github.com/llvm/llvm-project/commit/4d42e16eb8f7bb1484a1970324aa3391809188de
DIFF: https://github.com/llvm/llvm-project/commit/4d42e16eb8f7bb1484a1970324aa3391809188de.diff

LOG: [flang] runtime: fix problems with I/O around EOF & delimited characters

When a WRITE overwrites an endfile record, we need to forget
that there was an endfile record.  When doing a BACKSPACE
after an explicit ENDFILE statement, the position afterwards
must be upon the endfile record.

Attempts to join list-directed delimited character input across
record boundaries was due to a bad reading of the standard
and has been deleted, now that the requirements are better understood.
This problem would cause a read attempt past EOF if a delimited
character input value was at the end of a record.

It turns out that delimited list-directed (and NAMELIST) character
output is required to emit contiguous doubled instances of the
delimiter character when it appears in the output value.  When
fixed-size records are being emitted, as is the case with internal
output, this is not possible when the problematic character falls
on the last position of a record.  No two other Fortran compilers
do the same thing in this situation so there is no good precedent
to follow.

Because it seems least wrong, with this patch we now emit one copy
of the delimiter as the last character of the current record and
another as the first character of the next record.  (The
second-least-wrong alternative might be to flag a runtime error,
but that seems harsh since it's not an explicit error in the standard,
and the output may not have to be usable later as input anyway.)
Consequently, the output is not suitable for use as list-directed or
NAMELIST input.

If a later standard were to clarify this case, this behavior will of
course change as needed to conform.

Differential Revision: https://reviews.llvm.org/D106695

Added: 
    

Modified: 
    flang/docs/Extensions.md
    flang/runtime/edit-input.cpp
    flang/runtime/edit-output.cpp
    flang/runtime/unit.cpp
    flang/unittests/RuntimeGTest/Namelist.cpp

Removed: 
    


################################################################################
diff  --git a/flang/docs/Extensions.md b/flang/docs/Extensions.md
index e55be8b1bc948..49295ae430bd9 100644
--- a/flang/docs/Extensions.md
+++ b/flang/docs/Extensions.md
@@ -35,6 +35,29 @@ accepted if enabled by command-line options.
 * We are not strict on the contents of `BLOCK DATA` subprograms
   so long as they contain no executable code, no internal subprograms,
   and allocate no storage outside a named `COMMON` block.  (C1415)
+* Delimited list-directed (and NAMELIST) character output is required
+  to emit contiguous doubled instances of the delimiter character
+  when it appears in the output value.  When fixed-size records
+  are being emitted, as is the case with internal output, this
+  is not possible when the problematic character falls on the last
+  position of a record.  No two other Fortran compilers do the same
+  thing in this situation so there is no good precedent to follow.
+  Because it seems least wrong, we emit one copy of the delimiter as
+  the last character of the current record and another as the first
+  character of the next record.  (The second-least-wrong alternative
+  might be to flag a runtime error, but that seems harsh since it's
+  not an explicit error in the standard, and the output may not have
+  to be usable later as input anyway.)
+  Consequently, the output is not suitable for use as list-directed or
+  NAMELIST input.  If a later standard were to clarify this case, this
+  behavior will change as needed to conform.
+```
+character(11) :: buffer(3)
+character(10) :: quotes = '""""""""""'
+write(buffer,*,delim="QUOTE") quotes
+print "('>',a10,'<')", buffer
+end
+```
 
 ## Extensions, deletions, and legacy features supported by default
 

diff  --git a/flang/runtime/edit-input.cpp b/flang/runtime/edit-input.cpp
index 6ecbc164bc6f9..06f5b4dd6b9d3 100644
--- a/flang/runtime/edit-input.cpp
+++ b/flang/runtime/edit-input.cpp
@@ -381,31 +381,12 @@ static bool EditDelimitedCharacterInput(
     }
     io.HandleRelativePosition(1);
     if (*ch == delimiter) {
-      if (auto next{io.GetCurrentChar()}) {
-        if (*next == delimiter) {
-          // Repeated delimiter: use as character value
-          io.HandleRelativePosition(1);
-        } else { // closing delimiter
-          break;
-        }
-      } else { // delimiter was at the end of the record
-        if (length > 0) {
-          // Look ahead on next record: if it begins with the delimiter,
-          // treat it as a split character value, ignoring both delimiters
-          ConnectionState &connection{io.GetConnectionState()};
-          auto position{connection.positionInRecord};
-          if (io.AdvanceRecord()) {
-            if (auto next{io.GetCurrentChar()}; next && *next == delimiter) {
-              // Character constant split over a record boundary
-              io.HandleRelativePosition(1);
-              continue;
-            }
-            // Not a character value split over a record boundary.
-            io.BackspaceRecord();
-            connection.HandleAbsolutePosition(position);
-          }
-        }
-        break;
+      auto next{io.GetCurrentChar()};
+      if (next && *next == delimiter) {
+        // Repeated delimiter: use as character value
+        io.HandleRelativePosition(1);
+      } else {
+        break; // closing delimiter
       }
     }
     if (length > 0) {

diff  --git a/flang/runtime/edit-output.cpp b/flang/runtime/edit-output.cpp
index 2c5803bb59579..7a5130756d90a 100644
--- a/flang/runtime/edit-output.cpp
+++ b/flang/runtime/edit-output.cpp
@@ -440,21 +440,30 @@ bool ListDirectedDefaultCharacterOutput(IoStatementState &io,
   if (modes.delim) {
     ok = ok && list.EmitLeadingSpaceOrAdvance(io);
     // Value is delimited with ' or " marks, and interior
-    // instances of that character are doubled.  When split
-    // over multiple lines, delimit each lines' part.
+    // instances of that character are doubled.
     ok = ok && io.Emit(&modes.delim, 1);
-    for (std::size_t j{0}; j < length; ++j) {
-      if (connection.NeedAdvance(2)) {
-        ok = ok && io.Emit(&modes.delim, 1) && io.AdvanceRecord() &&
-            io.Emit(&modes.delim, 1);
+    auto EmitOne{[&](char ch) {
+      if (connection.NeedAdvance(1)) {
+        ok = ok && io.AdvanceRecord();
       }
+      ok = ok && io.Emit(&ch, 1);
+    }};
+    for (std::size_t j{0}; j < length; ++j) {
+      // Doubled delimiters must be put on the same record
+      // in order to be acceptable as list-directed or NAMELIST
+      // input; however, this requirement is not always possible
+      // when the records have a fixed length, as is the case with
+      // internal output.  The standard is silent on what should
+      // happen, and no two extant Fortran implementations do
+      // the same thing when tested with this case.
+      // This runtime splits the doubled delimiters across
+      // two records for lack of a better alternative.
       if (x[j] == modes.delim) {
-        ok = ok && io.EmitRepeated(modes.delim, 2);
-      } else {
-        ok = ok && io.Emit(&x[j], 1);
+        EmitOne(x[j]);
       }
+      EmitOne(x[j]);
     }
-    ok = ok && io.Emit(&modes.delim, 1);
+    EmitOne(modes.delim);
   } else {
     // Undelimited list-directed output
     ok = ok &&

diff  --git a/flang/runtime/unit.cpp b/flang/runtime/unit.cpp
index 8e870c00637bd..ab48ff218dff3 100644
--- a/flang/runtime/unit.cpp
+++ b/flang/runtime/unit.cpp
@@ -413,10 +413,12 @@ void ExternalFileUnit::FinishReadingRecord(IoErrorHandler &handler) {
         recordOffsetInFrame_ = sizeof(std::uint32_t);
         recordLength.reset();
       } else { // formatted
-        if (Frame()[recordOffsetInFrame_ + *recordLength] == '\r') {
+        if (FrameLength() > recordOffsetInFrame_ + *recordLength &&
+            Frame()[recordOffsetInFrame_ + *recordLength] == '\r') {
           ++recordOffsetInFrame_;
         }
-        if (Frame()[recordOffsetInFrame_ + *recordLength] == '\n') {
+        if (FrameLength() >= recordOffsetInFrame_ &&
+            Frame()[recordOffsetInFrame_ + *recordLength] == '\n') {
           ++recordOffsetInFrame_;
         }
         recordOffsetInFrame_ += *recordLength;
@@ -470,6 +472,9 @@ bool ExternalFileUnit::AdvanceRecord(IoErrorHandler &handler) {
     CommitWrites();
     impliedEndfile_ = true;
     ++currentRecordNumber;
+    if (endfileRecordNumber && currentRecordNumber >= *endfileRecordNumber) {
+      endfileRecordNumber.reset();
+    }
     return ok;
   }
 }
@@ -480,7 +485,8 @@ void ExternalFileUnit::BackspaceRecord(IoErrorHandler &handler) {
         "BACKSPACE(UNIT=%d) on non-sequential file", unitNumber());
   } else {
     if (endfileRecordNumber && currentRecordNumber > *endfileRecordNumber) {
-      // BACKSPACE after ENDFILE
+      // BACKSPACE after explicit ENDFILE
+      currentRecordNumber = *endfileRecordNumber;
     } else {
       DoImpliedEndfile(handler);
       if (frameOffsetInFile_ + recordOffsetInFrame_ > 0) {
@@ -534,7 +540,9 @@ void ExternalFileUnit::Endfile(IoErrorHandler &handler) {
     // ENDFILE after ENDFILE
   } else {
     DoEndfile(handler);
-    ++currentRecordNumber;
+    // Explicit ENDFILE leaves position *after* the endfile record
+    RUNTIME_CHECK(handler, endfileRecordNumber.has_value());
+    currentRecordNumber = *endfileRecordNumber + 1;
   }
 }
 

diff  --git a/flang/unittests/RuntimeGTest/Namelist.cpp b/flang/unittests/RuntimeGTest/Namelist.cpp
index 77eec4e341322..8f896cbe2448a 100644
--- a/flang/unittests/RuntimeGTest/Namelist.cpp
+++ b/flang/unittests/RuntimeGTest/Namelist.cpp
@@ -57,7 +57,7 @@ TEST(NamelistTests, BasicSanity) {
   complexes.push_back(std::complex<float>{123.0, -0.5});
   std::vector<std::string> characters;
   characters.emplace_back("aBcDeFgHiJkLmNoPqRsTuVwXyZ");
-  characters.emplace_back("0123456789'\"              ");
+  characters.emplace_back("0123456789'\"..............");
   // Copy the data into new descriptors
   OwningPtr<Descriptor> intDesc{
       MakeArray<TypeCategory::Integer, static_cast<int>(sizeof(int))>(
@@ -97,9 +97,9 @@ TEST(NamelistTests, BasicSanity) {
                                   " -1.7976931348623157E+308       "
                                   " 2.220446049250313E-16,LOGICALS="
                                   "F T F,COMPLEXES= (123.,-.5),    "
-                                  " CHARACTERS= 'aBcDeFgHiJkLmNoPq'"
-                                  "'RsTuVwXyZ' '0123456789''\"     '"
-                                  "'         '/                    "};
+                                  " CHARACTERS= 'aBcDeFgHiJkLmNoPqR"
+                                  "sTuVwXyZ' '0123456789''\"........"
+                                  "......'/                        "};
   std::string got{buffer[0], sizeof buffer};
   EXPECT_EQ(got, expect);
 


        


More information about the flang-commits mailing list