[PATCH] D99889: [AsmParser][ms][X86] Fix possible misbehaviour in parsing of special tokens at start of string.

Mon Apr 5 16:42:46 PDT 2021

anirudhp added a comment.

> It looks like this adds support for X86 identifiers starting with $ (I'm actually surprised this wasn't allowed before).

Well. Yes and no. If you take a look at the `parseIdentifier` routine in AsmParser.cpp, there is an edge case which covers when tokens start with @ and $, but there are conditions associated with that edge case. Namely, if you have a string which starts with [@|$] and the next token is an Identifier or an Integer, then you parse as an Integer. So if you have something like $abc, and you make a call to the `parseIdentifier` routine, then "$abc" will be parsed as a "whole" Identifier, even though, the lexer lexes it as {AsmToken::At,  AsmToken::Identifier}. But if you have something like "$$abc", then a call to the `parseIdentifier` routine will "fail" (and in the case the lexer lexes it as {AsmToken::At, AsmToken::At, AsmToken::Identifier}.

An argument could be made that the parseIdentifier routine could be modified or overridden wherever appropriate and given custom logic.

Part of this patch is also exploratory. The way I interpreted the comment `allow $ @ ? characters at the start of symbol names. Defaults to false.`, told me that if "$", "@", "?" occurred at the start of the string, it should be lexed as part of an Identifier. However, now I don't know if that comment was meant to be written taking into account the `parseIdentifier` routine. In that case, the X86MCAsmInfo backend, should only set `AllowQuestionTokenAtStartOfString` to true. The other two are not "required" since `parseIdentifier` would "take care" of it. I'm also not too familiar with the ms semantics :)

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D99889/new/

https://reviews.llvm.org/D99889