[LLVMdev] RFC building a target MCAsmParser

Colin LeMahieu colinl at codeaurora.org
Wed Apr 15 12:22:54 PDT 2015


One possibility for which we'd be interested in getting feedback is allowing
a target to fully handle the parsing process.

 

We have a generated parser that can output other compiler IRs and this could
be changed to output MCInsts.  If we could get the input text stream and an
output MC stream we could have a target specific way of doing all parsing.
Perhaps this would be useful to other targets that have difficulty?

 

A parser generator isn't distributed with the project so we could publish
the parser generator input and output.

 

From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] On
Behalf Of Colin LeMahieu
Sent: Tuesday, April 14, 2015 12:59 PM
To: 'LLVM Developers Mailing List'
Subject: [LLVMdev] RFC building a target MCAsmParser

 

Hi everyone.  We're interested in contributing a Hexagon assembler to MC and
we're looking for comments on a good way to integrate the grammar in to the
infrastructure.

 

We rely on having a robust assembler because we have a large base of
developers that write in assembly due to low power requirements for mobile
devices.  We put in some C-like concepts to make the syntax easier and this
design is fairly well received by users.

 

The following is a list of grammar snippets we've had trouble integrating in
to the asm parser framework.

 

Instruction packets are optionally enclosed in braces.

    { r0 = add(r1, r2) r1 = add(r2, r0) }

 

Register can be the beginning of a statement.  Register transfers have no
mnemonic.

    r0 = r1

 

Double registers have a colon in the middle which can look like a label

    r1:0 = add(r3:2, r5:4)

 

Predicated variants for many instructions

    if(p1) r0 = add(r1, r2)

 

Dense semantics for DSP applications.  Complex multiply optionally shifting
result left by 1 with optional rounding and optional saturation

    r0 = cmpy(r1, r2):<<1:rnd:sat

 

Hardware loops ended by optional packet suffix

    { r0 = r1 }:endloop0:endloop1

 

We found the Hexagon grammar to be straight forward to implement using plain
lex / parse but harder within the MCTargetAsmParser. 

 

We were thinking a way to get the grammar to work would involve modifying
tablegen and the main asm parser loop.  We'd have to make tablegen break
down each instructions in to a sequence of tokens and build a sorted
matching table based on the set of these sequences.  The matching loop would
bisect this sorted list looking for a match.  We think existing grammars
would be unaffected; all existing instructions start with a mnemonic so
their first token would be an identifier followed by the same sequence of
tokens they currently have.

 

Let us know if we're likely to run in to any issues making these changes or
if there are other recommendations on what we could do.  Thanks!

 

Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, 
a Linux Foundation Collaborative Project

 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150415/52316564/attachment.html>


More information about the llvm-dev mailing list