[PATCH] D99889: [AsmParser][ms][X86] Fix possible error in parsing of special tokens at start of string.

Anirudh Prasad via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Mon Apr 5 09:58:47 PDT 2021


anirudhp created this revision.
Herald added subscribers: pengfei, hiraditya.
anirudhp requested review of this revision.
Herald added a project: LLVM.
Herald added a subscriber: llvm-commits.

- Previously, https://reviews.llvm.org/D72680 introduced a new attribute called `AllowSymbolAtNameStart` (in relation to the MAsmParser changes) in `MCAsmInfo.h` which (according to the comment in the header) allows the following behaviour:

  c++
    /// This is true if the assembler allows $ @ ? characters at the start of
    /// symbol names. Defaults to false.



- However, the usage of this field in AsmLexer.cpp doesn't seem completely accurate* for a couple of reasons.

  c++
    default:
      if (MAI.doesAllowSymbolAtNameStart()) {
        // Handle Microsoft-style identifier: [a-zA-Z_$.@?][a-zA-Z0-9_$.@#?]*
        if (!isDigit(CurChar) &&
            isIdentifierChar(CurChar, MAI.doesAllowAtInName(),
                             AllowHashInIdentifier))
          return LexIdentifier();
      }



1. The Dollar and At tokens are treated as separate tokens (AsmToken::Dollar and AsmToken::At respectively) and not lexed as an Identifier.
2. I'm not too sure why `MAI.doesAllowAtInName()` is used when `AllowAtInIdentifier` could be used. For X86 platforms, afaict, this shouldn't be an issue, since the `CommentString` attribute isn't "@". (alternatively the call to the setter can be set anywhere else as needed).

My proposal is the following:

- Introduce 3 new fields called `AllowQuestionTokenAtStartOfString`, `AllowDollarTokenAtStartOfString` and `AllowAtTokenAtStartOfString` in MCAsmInfo.h which will encapsulate the previously documented behaviour of "allowing $, @, ? characters at the start of symbol names")
- Introduce these fields where "$", "@" are lexed, and treat them as identifiers depending on whether `Allow[Dollar|At]TokenAtStartOfString` is set.
- For the sole case of "?", append it to the existing logic for treating a "default" token as an Identifier.

z/OS (HLASM) will also make use of some of these fields in follow up patches.

completely accurate* - This was based on the comments and the intended behaviour the code. I might have completely misinterpreted it, and if that is the case my sincere apologies. We can close this patch if necessary, if there are no changes to be made :)


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D99889

Files:
  llvm/include/llvm/MC/MCAsmInfo.h
  llvm/lib/MC/MCParser/AsmLexer.cpp
  llvm/lib/Target/X86/MCTargetDesc/X86MCAsmInfo.cpp
  llvm/unittests/MC/SystemZ/SystemZAsmLexerTest.cpp

-------------- next part --------------
A non-text attachment was scrubbed...
Name: D99889.335279.patch
Type: text/x-patch
Size: 8731 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20210405/b150a238/attachment-0001.bin>


More information about the llvm-commits mailing list