[PATCH] D99889: [AsmParser][ms][X86] Fix possible error in parsing of special tokens at start of string.
Anirudh Prasad via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Mon Apr 5 09:58:47 PDT 2021
anirudhp created this revision.
Herald added subscribers: pengfei, hiraditya.
anirudhp requested review of this revision.
Herald added a project: LLVM.
Herald added a subscriber: llvm-commits.
- Previously, https://reviews.llvm.org/D72680 introduced a new attribute called `AllowSymbolAtNameStart` (in relation to the MAsmParser changes) in `MCAsmInfo.h` which (according to the comment in the header) allows the following behaviour:
c++
/// This is true if the assembler allows $ @ ? characters at the start of
/// symbol names. Defaults to false.
- However, the usage of this field in AsmLexer.cpp doesn't seem completely accurate* for a couple of reasons.
c++
default:
if (MAI.doesAllowSymbolAtNameStart()) {
// Handle Microsoft-style identifier: [a-zA-Z_$.@?][a-zA-Z0-9_$.@#?]*
if (!isDigit(CurChar) &&
isIdentifierChar(CurChar, MAI.doesAllowAtInName(),
AllowHashInIdentifier))
return LexIdentifier();
}
1. The Dollar and At tokens are treated as separate tokens (AsmToken::Dollar and AsmToken::At respectively) and not lexed as an Identifier.
2. I'm not too sure why `MAI.doesAllowAtInName()` is used when `AllowAtInIdentifier` could be used. For X86 platforms, afaict, this shouldn't be an issue, since the `CommentString` attribute isn't "@". (alternatively the call to the setter can be set anywhere else as needed).
My proposal is the following:
- Introduce 3 new fields called `AllowQuestionTokenAtStartOfString`, `AllowDollarTokenAtStartOfString` and `AllowAtTokenAtStartOfString` in MCAsmInfo.h which will encapsulate the previously documented behaviour of "allowing $, @, ? characters at the start of symbol names")
- Introduce these fields where "$", "@" are lexed, and treat them as identifiers depending on whether `Allow[Dollar|At]TokenAtStartOfString` is set.
- For the sole case of "?", append it to the existing logic for treating a "default" token as an Identifier.
z/OS (HLASM) will also make use of some of these fields in follow up patches.
completely accurate* - This was based on the comments and the intended behaviour the code. I might have completely misinterpreted it, and if that is the case my sincere apologies. We can close this patch if necessary, if there are no changes to be made :)
Repository:
rG LLVM Github Monorepo
https://reviews.llvm.org/D99889
Files:
llvm/include/llvm/MC/MCAsmInfo.h
llvm/lib/MC/MCParser/AsmLexer.cpp
llvm/lib/Target/X86/MCTargetDesc/X86MCAsmInfo.cpp
llvm/unittests/MC/SystemZ/SystemZAsmLexerTest.cpp
-------------- next part --------------
A non-text attachment was scrubbed...
Name: D99889.335279.patch
Type: text/x-patch
Size: 8731 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20210405/b150a238/attachment-0001.bin>
More information about the llvm-commits
mailing list