[PATCH] Add support for ARM modified immediate syntax

Tim Northover t.p.northover at gmail.com
Mon Jul 29 11:47:38 PDT 2013


Hi Mihail,

> We'd have to change how the immediate is represented.

Ah, I didn't realise it was a truly unencoded immediate. Still, it
would be changing the immediate to be in line with the goals of the MC
layer.

> If we just parse this syntax
> and keep the internal representation of this kind of operand we are
> loosing the ability to differentiate between equivalent yet different
> encodings.

Agreed, that would be no good. For a start there would be nothing to
forward the information on to the encoder.

> and r0, r1, #1, #0
> and r0, r1, #1, #2
> ...
> and r0, r1, #1, #30
>
> are are equivalent but have different encodings.

The first one actually appears to be different in its behaviour w.r.t.
the 'C' flag, to be a pedant.

> The point is to retain the exact form that was parsed.

I think that's a misguided goal. Each bit-pattern really has a
canonical textual representation, and we should be trying to print
that rather than any non-canonical alias someone wrote.

> This implies that internally MC needs to be changed to hold the [value,
> modifier] pair rather than the value itself. Isn't this uglier, more difficult and
> more risky?

It's the preferred MC-layer representation, as I think we've discussed
before. But beyond that, I think it's necessary longer-term...

> To me it seems more natural to solve an essentially syntactical problem with
> a asm parsing only solution.

This problem has a disassembly layer manifestation too (which you'll
almost certainly meet in one of MC Hammer's other modes):

$ echo [0x00,0x01,0xb0,0xe3] | bin/llvm-mc -disassemble -arch=arm
        movs    r0, #0

(should be "movs r0, #0, #1" I think). I believe the closest extension
of your proposed solution that can handle this would be adding a
DecoderMethod to each instruction, which sets the MCInst's opcode to
your AsmParserOnly variant if it detected an odd immediate.

That would be beyond horrific and into nightmare-inducing.

> The other thing is that I have doubts about the asm parser's ability to
> support this, even via custom parsing methods. In the end what tablegen generates is a
> parser in name only. It's not a real parser.

If I'm understanding you correctly, that's what comes out if you set
just the Name method (the isXXX and addXXX methods). The ParserMethod
gets called by TableGen as a short-circuit hook in normal parsing and
has direct access to the token stream of the assembler
("AsmToken::Hash, AsmToken::Integer, AsmToken::Comma, ...").

> Regarding the duplication, this is a tablegen improvement opportunity. We
> should  be able to factor out the definitions common to all members of a
> multiclass; but alas this doesn't work.

That's really a side issue. We'd still be left with two Instruction
instances representing the same instruction, which is not a good state
to be in.

Cheers.

Tim.



More information about the llvm-commits mailing list