[LLVMdev] RFC building a target MCAsmParser

Tue Apr 14 11:16:54 PDT 2015

Frankly, the MC assembler infrastructure is pretty weak. I've spoken
personally to some of the developers who wrote it, and they said they know
it's bad and suggested that it be rewritten. My experience from attempting
to make MC parse Intel x86 asm confirms this.

Here's some questions for you:
Do you need a high-level macro assembler designed for human consumption?
Do you need Clang to understand inline assembly that uses these high-level
assembly constructs?
Do you need the assembler to parse the output of `clang -S`?

It might be best to have your own small frontend for the macro assembler
language that produces MCInsts which can be printed using a more
traditional gas-like mneumonic. Running llvm-mc on a hexagon assembly file
and re-emitting assembly would parse the high-level assembly and produce
the low-level assembly.

Alternatively, you can do as you say and try to make MC into more of a real
language frontend. The major problem is that lexing rules are not the same
across all targets, even when they all use a gas-like syntax.

On Tue, Apr 14, 2015 at 10:58 AM, Colin LeMahieu <colinl at codeaurora.org>
wrote:

> Hi everyone.  We’re interested in contributing a Hexagon assembler to MC
> and we’re looking for comments on a good way to integrate the grammar in to
> the infrastructure.
>
>
>
> We rely on having a robust assembler because we have a large base of
> developers that write in assembly due to low power requirements for mobile
> devices.  We put in some C-like concepts to make the syntax easier and this
> design is fairly well received by users.
>
>
>
> The following is a list of grammar snippets we’ve had trouble integrating
> in to the asm parser framework.
>
>
>
> Instruction packets are optionally enclosed in braces.
>
>     { r0 = add(r1, r2) r1 = add(r2, r0) }
>
>
>
> Register can be the beginning of a statement.  Register transfers have no
> mnemonic.
>
>     r0 = r1
>
>
>
> Double registers have a colon in the middle which can look like a label
>
>     r1:0 = add(r3:2, r5:4)
>
>
>
> Predicated variants for many instructions
>
>     if(p1) r0 = add(r1, r2)
>
>
>
> Dense semantics for DSP applications.  Complex multiply optionally
> shifting result left by 1 with optional rounding and optional saturation
>
>     r0 = cmpy(r1, r2):<<1:rnd:sat
>
>
>
> Hardware loops ended by optional packet suffix
>
>     { r0 = r1 }:endloop0:endloop1
>
>
>
> We found the Hexagon grammar to be straight forward to implement using
> plain lex / parse but harder within the MCTargetAsmParser.
>
>
>
> We were thinking a way to get the grammar to work would involve modifying
> tablegen and the main asm parser loop.  We’d have to make tablegen break
> down each instructions in to a sequence of tokens and build a sorted
> matching table based on the set of these sequences.  The matching loop
> would bisect this sorted list looking for a match.  We think existing
> grammars would be unaffected; all existing instructions start with a
> mnemonic so their first token would be an identifier followed by the same
> sequence of tokens they currently have.
>
>
>
> Let us know if we’re likely to run in to any issues making these changes
> or if there are other recommendations on what we could do.  Thanks!
>
>
>
> Qualcomm Innovation Center, Inc.
> The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
> a Linux Foundation Collaborative Project
>
>
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150414/0503ef03/attachment.html>