[flang-commits] [flang] [flang] Inhibit case of false tokenization of Hollerith (PR #79029)

Peter Klausler via flang-commits flang-commits at lists.llvm.org
Mon Jan 22 10:31:46 PST 2024


https://github.com/klausler created https://github.com/llvm/llvm-project/pull/79029

https://github.com/llvm/llvm-project/issues/78927 contains a case of fixed-form source in which a Hollerith literal is mistakenly tokenized, leading to grief later due to apparently unbalanced parentheses.

The source looks like "REAL*8 R8HEAP(SCRSIZE)" and the Hollerith literal is misrecognized as such because it follows "8R".  In order to properly tokenize Hollerith literals in old comma-free FORMAT statements like "1 FORMAT(3I5HFLANG)", the tokenizer in the prescanner treats a letter after an integer token ("3I") as a special case. The fix is to do this only when the characters involved are nested in parentheses and Hollerith is a possibility.

Fixes https://github.com/llvm/llvm-project/issues/78927.

>From f15469703deab1306079796853b0939ce9ff0c5e Mon Sep 17 00:00:00 2001
From: Peter Klausler <pklausler at nvidia.com>
Date: Mon, 22 Jan 2024 10:22:59 -0800
Subject: [PATCH] [flang] Inhibit case of false tokenization of Hollerith

https://github.com/llvm/llvm-project/issues/78927 contains a case
of fixed-form source in which a Hollerith literal is mistakenly
tokenized, leading to grief later due to apparently unbalanced
parentheses.

The source looks like "REAL*8 R8HEAP(SCRSIZE)" and the Hollerith
literal is misrecognized as such because it follows "8R".  In order
to properly tokenize Hollerith literals in old comma-free FORMAT
statements like "1 FORMAT(3I5HFLANG)", the tokenizer in the prescanner
treats a letter after an integer token ("3I") as a special case.
The fix is to do this only when the characters involved are nested
in parentheses and Hollerith is a possibility.

Fixes https://github.com/llvm/llvm-project/issues/78927.
---
 flang/lib/Parser/prescan.cpp | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/flang/lib/Parser/prescan.cpp b/flang/lib/Parser/prescan.cpp
index 68d7d9f0c53c47..029652adbca1df 100644
--- a/flang/lib/Parser/prescan.cpp
+++ b/flang/lib/Parser/prescan.cpp
@@ -605,13 +605,14 @@ bool Prescanner::NextToken(TokenSequence &tokens) {
       do {
         EmitCharAndAdvance(tokens, *at_);
       } while (IsHexadecimalDigit(*at_));
-    } else if (IsLetter(*at_)) {
-      // Handles FORMAT(3I9HHOLLERITH) by skipping over the first I so that
-      // we don't misrecognize I9HOLLERITH as an identifier in the next case.
-      EmitCharAndAdvance(tokens, *at_);
     } else if (at_[0] == '_' && (at_[1] == '\'' || at_[1] == '"')) { // 4_"..."
       EmitCharAndAdvance(tokens, *at_);
       QuotedCharacterLiteral(tokens, start);
+    } else if (IsLetter(*at_) && !preventHollerith_ &&
+        parenthesisNesting_ > 0) {
+      // Handles FORMAT(3I9HHOLLERITH) by skipping over the first I so that
+      // we don't misrecognize I9HOLLERITH as an identifier in the next case.
+      EmitCharAndAdvance(tokens, *at_);
     }
     preventHollerith_ = false;
   } else if (*at_ == '.') {



More information about the flang-commits mailing list