[LLVMdev] Hexagon Assembly parser question

Sun Nov 25 11:52:14 PST 2012

Hi Jim,

I am now able to parse out the mnemonic from the instruction table, and
create the .inc file.  I am currently doing this by searching for the
mnemonic based on instruction's structure.  Not quite ready to get the code
reviewed yet, but enough to move forward.

I'm now stuck on the second part of the email that you sent, in the
Asmparser.

In particular, your statement has been very prescient:

That said, you'll also likely have to do a bit of work in the generic
AsmParser code, as it'll likely look at statements like these and not
realize they're instruction sequences. The "mnemonic <whitespace> operands"
format is pretty strongly imprinted on everything.

For Hexagon's case, statements like this 

r0 = ##.L.str

r1 = #0

r0 = r1  

Are errors have issues because they are being parsed as assignments:

Line 1206: AsmParser.cpp

  case AsmToken::Equal:

    // identifier '=' ... -> assignment statement

    Lex();

    return ParseAssignment(IDVal, true);

as the equal sign is the second token.  Is it possible to check this after
we check everything else?   This would allow me to check whether or not the
= sign represents an instruction or not before classifying as an assignment?
I was thinking that since we are mainly trying to match instructions, the
input parsing may be faster if we didn't try to identify everything as a
directive first?

Regards,

David 

The table is sorted by mnemonic (more abstractly, by operator). That's
pretty fundamental to how it works, so sticking with that would be good
unless you want to write an entirely new algorithm. You could probably stick
with the current basic stuff with some fiddling in tablegen where the asm
string gets split apart when building up matchables to re-order things
appropriately. Then your ParseInstruction() implementation would do similar
tricks. The printer should "just work," thankfully.

That said, you'll also likely have to do a bit of work in the generic
AsmParser code, as it'll likely look at statements like these and not
realize they're instruction sequences. The "mnemonic <whitespace> operands"
format is pretty strongly imprinted on everything. That's not completely
unfixable, of course, but it may be a bit tricky to avoid syntactic
ambiguities. Not impossible, mind, just tricky and something to pay very
close attention to in your design.

-Jim

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20121125/2b893675/attachment.html>