[all-commits] [llvm/llvm-project] 8f6185: [AsmParser][ms][X86] Fix possible misbehaviour in ...
Anirudh Prasad via All-commits
all-commits at lists.llvm.org
Wed Apr 21 07:21:29 PDT 2021
Branch: refs/heads/main
Home: https://github.com/llvm/llvm-project
Commit: 8f6185c71378467e8c27e8996ff2dbaed3e41dbc
https://github.com/llvm/llvm-project/commit/8f6185c71378467e8c27e8996ff2dbaed3e41dbc
Author: Anirudh Prasad <anirudh_prasad at hotmail.com>
Date: 2021-04-21 (Wed, 21 Apr 2021)
Changed paths:
M llvm/include/llvm/MC/MCAsmInfo.h
M llvm/lib/MC/MCParser/AsmLexer.cpp
M llvm/lib/Target/X86/MCTargetDesc/X86MCAsmInfo.cpp
M llvm/unittests/MC/SystemZ/SystemZAsmLexerTest.cpp
Log Message:
-----------
[AsmParser][ms][X86] Fix possible misbehaviour in parsing of special tokens at start of string.
- Previously, https://reviews.llvm.org/D72680 introduced a new attribute called `AllowSymbolAtNameStart` (in relation to the MAsmParser changes) in `MCAsmInfo.h` which (according to the comment in the header) allows the following behaviour:
```
/// This is true if the assembler allows $ @ ? characters at the start of
/// symbol names. Defaults to false.
```
- However, the usage of this field in AsmLexer.cpp doesn't seem completely accurate* for a couple of reasons.
```
default:
if (MAI.doesAllowSymbolAtNameStart()) {
// Handle Microsoft-style identifier: [a-zA-Z_$.@?][a-zA-Z0-9_$.@#?]*
if (!isDigit(CurChar) &&
isIdentifierChar(CurChar, MAI.doesAllowAtInName(),
AllowHashInIdentifier))
return LexIdentifier();
}
```
1. The Dollar and At tokens, when occurring at the start of the string, are treated as separate tokens (AsmToken::Dollar and AsmToken::At respectively) and not lexed as an Identifier.
2. I'm not too sure why `MAI.doesAllowAtInName()` is used when `AllowAtInIdentifier` could be used. For X86 platforms, afaict, this shouldn't be an issue, since the `CommentString` attribute isn't "@". (alternatively the call to the setter can be set anywhere else as needed). The `AllowAtInName` does have an additional important meaning, but in the context of AsmLexer, shouldn't mean anything different compared to `AllowAtInIdentifier`
My proposal is the following:
- Introduce 3 new fields called `AllowQuestionTokenAtStartOfString`, `AllowDollarTokenAtStartOfString` and `AllowAtTokenAtStartOfString` in MCAsmInfo.h which will encapsulate the previously documented behaviour of "allowing $, @, ? characters at the start of symbol names")
- Introduce these fields where "$", "@" are lexed, and treat them as identifiers depending on whether `Allow[Dollar|At]TokenAtStartOfString` is set.
- For the sole case of "?", append it to the existing logic for treating a "default" token as an Identifier.
z/OS (HLASM) will also make use of some of these fields in follow up patches.
completely accurate* - This was based on the comments and the intended behaviour the code. I might have completely misinterpreted it, and if that is the case my sincere apologies. We can close this patch if necessary, if there are no changes to be made :)
Depends on https://reviews.llvm.org/D99374
Reviewed By: Jonathan.Crowther
Differential Revision: https://reviews.llvm.org/D99889
More information about the All-commits
mailing list