[all-commits] [llvm/llvm-project] 8f6185: [AsmParser][ms][X86] Fix possible misbehaviour in ...

Anirudh Prasad via All-commits all-commits at lists.llvm.org
Wed Apr 21 07:21:29 PDT 2021


  Branch: refs/heads/main
  Home:   https://github.com/llvm/llvm-project
  Commit: 8f6185c71378467e8c27e8996ff2dbaed3e41dbc
      https://github.com/llvm/llvm-project/commit/8f6185c71378467e8c27e8996ff2dbaed3e41dbc
  Author: Anirudh Prasad <anirudh_prasad at hotmail.com>
  Date:   2021-04-21 (Wed, 21 Apr 2021)

  Changed paths:
    M llvm/include/llvm/MC/MCAsmInfo.h
    M llvm/lib/MC/MCParser/AsmLexer.cpp
    M llvm/lib/Target/X86/MCTargetDesc/X86MCAsmInfo.cpp
    M llvm/unittests/MC/SystemZ/SystemZAsmLexerTest.cpp

  Log Message:
  -----------
  [AsmParser][ms][X86] Fix possible misbehaviour in parsing of special tokens at start of string.

- Previously, https://reviews.llvm.org/D72680 introduced a new attribute called `AllowSymbolAtNameStart` (in relation to the MAsmParser changes) in `MCAsmInfo.h` which (according to the comment in the header) allows the following behaviour:

```
  /// This is true if the assembler allows $ @ ? characters at the start of
  /// symbol names. Defaults to false.
```

- However, the usage of this field in AsmLexer.cpp doesn't seem completely accurate* for a couple of reasons.

```
  default:
    if (MAI.doesAllowSymbolAtNameStart()) {
      // Handle Microsoft-style identifier: [a-zA-Z_$.@?][a-zA-Z0-9_$.@#?]*
      if (!isDigit(CurChar) &&
          isIdentifierChar(CurChar, MAI.doesAllowAtInName(),
                           AllowHashInIdentifier))
        return LexIdentifier();
    }
```

1. The Dollar and At tokens, when occurring at the start of the string, are treated as separate tokens (AsmToken::Dollar and AsmToken::At respectively) and not lexed as an Identifier.
2. I'm not too sure why `MAI.doesAllowAtInName()` is used when `AllowAtInIdentifier` could be used. For X86 platforms, afaict, this shouldn't be an issue, since the `CommentString` attribute isn't "@". (alternatively the call to the setter can be set anywhere else as needed). The `AllowAtInName` does have an additional important meaning, but in the context of AsmLexer, shouldn't mean anything different compared to `AllowAtInIdentifier`

My proposal is the following:

- Introduce 3 new fields called `AllowQuestionTokenAtStartOfString`, `AllowDollarTokenAtStartOfString` and `AllowAtTokenAtStartOfString` in MCAsmInfo.h which will encapsulate the previously documented behaviour of "allowing $, @, ? characters at the start of symbol names")
- Introduce these fields where "$", "@" are lexed, and treat them as identifiers depending on whether `Allow[Dollar|At]TokenAtStartOfString` is set.
- For the sole case of "?", append it to the existing logic for treating a "default" token as an Identifier.

z/OS (HLASM) will also make use of some of these fields in follow up patches.

completely accurate* - This was based on the comments and the intended behaviour the code. I might have completely misinterpreted it, and if that is the case my sincere apologies. We can close this patch if necessary, if there are no changes to be made :)

Depends on https://reviews.llvm.org/D99374

Reviewed By: Jonathan.Crowther

Differential Revision: https://reviews.llvm.org/D99889




More information about the All-commits mailing list