[PATCH] Add support for ARM modified immediate syntax
Tim Northover
t.p.northover at gmail.com
Mon Jul 29 11:47:38 PDT 2013
Hi Mihail,
> We'd have to change how the immediate is represented.
Ah, I didn't realise it was a truly unencoded immediate. Still, it
would be changing the immediate to be in line with the goals of the MC
layer.
> If we just parse this syntax
> and keep the internal representation of this kind of operand we are
> loosing the ability to differentiate between equivalent yet different
> encodings.
Agreed, that would be no good. For a start there would be nothing to
forward the information on to the encoder.
> and r0, r1, #1, #0
> and r0, r1, #1, #2
> ...
> and r0, r1, #1, #30
>
> are are equivalent but have different encodings.
The first one actually appears to be different in its behaviour w.r.t.
the 'C' flag, to be a pedant.
> The point is to retain the exact form that was parsed.
I think that's a misguided goal. Each bit-pattern really has a
canonical textual representation, and we should be trying to print
that rather than any non-canonical alias someone wrote.
> This implies that internally MC needs to be changed to hold the [value,
> modifier] pair rather than the value itself. Isn't this uglier, more difficult and
> more risky?
It's the preferred MC-layer representation, as I think we've discussed
before. But beyond that, I think it's necessary longer-term...
> To me it seems more natural to solve an essentially syntactical problem with
> a asm parsing only solution.
This problem has a disassembly layer manifestation too (which you'll
almost certainly meet in one of MC Hammer's other modes):
$ echo [0x00,0x01,0xb0,0xe3] | bin/llvm-mc -disassemble -arch=arm
movs r0, #0
(should be "movs r0, #0, #1" I think). I believe the closest extension
of your proposed solution that can handle this would be adding a
DecoderMethod to each instruction, which sets the MCInst's opcode to
your AsmParserOnly variant if it detected an odd immediate.
That would be beyond horrific and into nightmare-inducing.
> The other thing is that I have doubts about the asm parser's ability to
> support this, even via custom parsing methods. In the end what tablegen generates is a
> parser in name only. It's not a real parser.
If I'm understanding you correctly, that's what comes out if you set
just the Name method (the isXXX and addXXX methods). The ParserMethod
gets called by TableGen as a short-circuit hook in normal parsing and
has direct access to the token stream of the assembler
("AsmToken::Hash, AsmToken::Integer, AsmToken::Comma, ...").
> Regarding the duplication, this is a tablegen improvement opportunity. We
> should be able to factor out the definitions common to all members of a
> multiclass; but alas this doesn't work.
That's really a side issue. We'd still be left with two Instruction
instances representing the same instruction, which is not a good state
to be in.
Cheers.
Tim.
More information about the llvm-commits
mailing list