[PATCH] D99889: [AsmParser][ms][X86] Fix possible misbehaviour in parsing of special tokens at start of string.

Anirudh Prasad via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Tue Apr 13 12:18:33 PDT 2021


anirudhp added inline comments.


================
Comment at: llvm/lib/MC/MCParser/AsmLexer.cpp:771-772
   switch (CurChar) {
   default:
-    if (MAI.doesAllowSymbolAtNameStart()) {
-      // Handle Microsoft-style identifier: [a-zA-Z_$.@?][a-zA-Z0-9_$.@#?]*
-      if (!isDigit(CurChar) &&
-          isIdentifierChar(CurChar, MAI.doesAllowAtInName(),
-                           AllowHashInIdentifier))
-        return LexIdentifier();
-    } else {
-      // Handle identifier: [a-zA-Z_.][a-zA-Z0-9_$.@]*
-      if (isalpha(CurChar) || CurChar == '_' || CurChar == '.')
-        return LexIdentifier();
-    }
+    // Handle identifier: [a-zA-Z_.?][a-zA-Z0-9_$.@#?]*
+    if (isalpha(CurChar) || CurChar == '_' || CurChar == '.' ||
----------------
Jonathan.Crowther wrote:
> Previously, the Microsoft-style identifier allowed `#` (not as a starting character), but otherwise the identifier (`[a-zA-Z_.][a-zA-Z0-9_$.@]*`) did not. Should we be checking `AllowHashInIdentifier`?
>Previously, the Microsoft-style identifier allowed # (not as a starting character)

I wouldn't say that MS previously allowed "#" as a starting character, nor as part of the identifier. I don't believe the call to `setAllowHashInIdentifier` is set (apart from the eventual call to the setter for z/OS). LexIdentifier() eventually calls `IsIdentifierChar` with `AllowAtInIdentifier` and `AllowHashInIdentifier`.

(As an aside, "#" is special in most cases. In most cases, strings that start with "#" are already lexed as a possible comment)


================
Comment at: llvm/unittests/MC/SystemZ/SystemZAsmLexerTest.cpp:544-545
+}
+
+TEST_F(SystemZAsmLexerTest, CheckDefaultHashAtStartOfIdentifier) {
+  StringRef AsmStr = "##lj1?23";
----------------
Jonathan.Crowther wrote:
> Not sure what you're checking here that the `CheckStrictCommentString*` tests don't already check.
The idea was to show that even under default circumstances, "#" is usually lexed as a default comment. But like you say its probably already covered in the CommentString tests. I'll look at it a bit deeply again, and update if needed.


================
Comment at: llvm/unittests/MC/SystemZ/SystemZAsmLexerTest.cpp:557
+  lexAndCheckTokens(AsmStr, ExpectedTokens);
+}
 } // end anonymous namespace
----------------
Jonathan.Crowther wrote:
> I think there should be a test for `$` at the start of an identifier for completeness. I don't see one in `SystemZAsmLexerTest.cpp` but you may know of one somewhere else. 
Hmm you're right. I want to say there was a reason I didn't include the test for the "$" but honestly, now I don't remember. I'll go back and add the tests and if there was a problem / I remember why I didn't add it initially, I'll update this thread.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D99889/new/

https://reviews.llvm.org/D99889



More information about the llvm-commits mailing list