[llvm-dev] [ms] [LLVM-ML] Reserved words in assemblers

Eric Astor via llvm-dev llvm-dev at lists.llvm.org
Thu May 27 08:04:02 PDT 2021


Hi all,

I'm still working on MASM support via LLVM-ML, though it's been pushed to a
backburner project due to lack of reviewers - anyone who'd be interested in
reviewing, please contact me!

However: I've also encountered an ambiguity in MASM syntax. For example:
<VAR> DWORD 5
is a valid declaration of a 32-bit variable with value 5... but
CALL DWORD PTR [<ADDRESS>]
is a valid x86 call instruction. (Yes, MASM has infix directives, and most
of its directives are valid identifiers.)

It looks like ML.EXE resolves this by keeping reserved words that can't be
used as identifiers: specifically, all native instructions, as well as MASM
directives, operators, and other predefined symbols. Unfortunately, there's
not currently an interface in MCTargetAsmParser to check whether a string
is an instruction name, so that's less trivial than it might be.

I see 3 ways to resolve this problem:

1. (easy) disambiguate size directives (DWORD PTR) from variable
declarations (DWORD) by lookahead for "PTR" tokens. This covers the case I
know of so far, but there could be cases I haven't spotted yet. Draft
Phabricator patch: https://reviews.llvm.org/D103257

2. (medium) introduce a new function:
MCTargetAsmParser.isValidInstructionMnemonic (name to be bikeshedded). This
will have to be introduced for all MCTargetAsmParsers, which is not
ideal... but it can leverage existing GET_MNEMONIC_CHECKER infrastructure
to recognize the names from TableGen files. Using this, we can define our
list of reserved words, and work from there.

3. (hard) introduce a new function: MCTargetAsmParser.tryParseInstruction,
which parses an instruction if present and otherwise backtracks, restoring
parser & lexer state as if it had never been called. Again, this needs to
be introduced for all MCTargetAsmParsers. (We've already done this with
tryParseRegister, but that was relatively simple.) MasmParser can call this
*first*, and only try to recognize a directive if the instruction fails to
parse.

I'm currently leaning towards option #1. Anyone opposed, or see a
significant benefit to having the other options for other reasons?

Thanks,
- Eric
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20210527/77eaaf8c/attachment.html>


More information about the llvm-dev mailing list