[PATCH] D99889: [AsmParser][ms][X86] Fix possible misbehaviour in parsing of special tokens at start of string.
Anirudh Prasad via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Tue Apr 13 12:18:33 PDT 2021
anirudhp added inline comments.
================
Comment at: llvm/lib/MC/MCParser/AsmLexer.cpp:771-772
switch (CurChar) {
default:
- if (MAI.doesAllowSymbolAtNameStart()) {
- // Handle Microsoft-style identifier: [a-zA-Z_$.@?][a-zA-Z0-9_$.@#?]*
- if (!isDigit(CurChar) &&
- isIdentifierChar(CurChar, MAI.doesAllowAtInName(),
- AllowHashInIdentifier))
- return LexIdentifier();
- } else {
- // Handle identifier: [a-zA-Z_.][a-zA-Z0-9_$.@]*
- if (isalpha(CurChar) || CurChar == '_' || CurChar == '.')
- return LexIdentifier();
- }
+ // Handle identifier: [a-zA-Z_.?][a-zA-Z0-9_$.@#?]*
+ if (isalpha(CurChar) || CurChar == '_' || CurChar == '.' ||
----------------
Jonathan.Crowther wrote:
> Previously, the Microsoft-style identifier allowed `#` (not as a starting character), but otherwise the identifier (`[a-zA-Z_.][a-zA-Z0-9_$.@]*`) did not. Should we be checking `AllowHashInIdentifier`?
>Previously, the Microsoft-style identifier allowed # (not as a starting character)
I wouldn't say that MS previously allowed "#" as a starting character, nor as part of the identifier. I don't believe the call to `setAllowHashInIdentifier` is set (apart from the eventual call to the setter for z/OS). LexIdentifier() eventually calls `IsIdentifierChar` with `AllowAtInIdentifier` and `AllowHashInIdentifier`.
(As an aside, "#" is special in most cases. In most cases, strings that start with "#" are already lexed as a possible comment)
================
Comment at: llvm/unittests/MC/SystemZ/SystemZAsmLexerTest.cpp:544-545
+}
+
+TEST_F(SystemZAsmLexerTest, CheckDefaultHashAtStartOfIdentifier) {
+ StringRef AsmStr = "##lj1?23";
----------------
Jonathan.Crowther wrote:
> Not sure what you're checking here that the `CheckStrictCommentString*` tests don't already check.
The idea was to show that even under default circumstances, "#" is usually lexed as a default comment. But like you say its probably already covered in the CommentString tests. I'll look at it a bit deeply again, and update if needed.
================
Comment at: llvm/unittests/MC/SystemZ/SystemZAsmLexerTest.cpp:557
+ lexAndCheckTokens(AsmStr, ExpectedTokens);
+}
} // end anonymous namespace
----------------
Jonathan.Crowther wrote:
> I think there should be a test for `$` at the start of an identifier for completeness. I don't see one in `SystemZAsmLexerTest.cpp` but you may know of one somewhere else.
Hmm you're right. I want to say there was a reason I didn't include the test for the "$" but honestly, now I don't remember. I'll go back and add the tests and if there was a problem / I remember why I didn't add it initially, I'll update this thread.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D99889/new/
https://reviews.llvm.org/D99889
More information about the llvm-commits
mailing list