[PATCH] Add support for ARM modified immediate syntax

Mon Jul 29 12:55:27 PDT 2013

On Jul 29, 2013, at 11:47 AM, Tim Northover <t.p.northover at gmail.com> wrote:

> Hi Mihail,
> 
>> We'd have to change how the immediate is represented.
> 
> Ah, I didn't realise it was a truly unencoded immediate. Still, it
> would be changing the immediate to be in line with the goals of the MC
> layer.
> 
>> If we just parse this syntax
>> and keep the internal representation of this kind of operand we are
>> loosing the ability to differentiate between equivalent yet different
>> encodings.
> 
> Agreed, that would be no good. For a start there would be nothing to
> forward the information on to the encoder.
> 
>> and r0, r1, #1, #0
>> and r0, r1, #1, #2
>> ...
>> and r0, r1, #1, #30
>> 
>> are are equivalent but have different encodings.
> 
> The first one actually appears to be different in its behaviour w.r.t.
> the 'C' flag, to be a pedant.
> 
>> The point is to retain the exact form that was parsed.
> 
> I think that's a misguided goal. Each bit-pattern really has a
> canonical textual representation, and we should be trying to print
> that rather than any non-canonical alias someone wrote.
> 

+1. This is very important. The MCInst should not have any memory of what the assembly syntax form was.

>> This implies that internally MC needs to be changed to hold the [value,
>> modifier] pair rather than the value itself. Isn't this uglier, more difficult and
>> more risky?
> 
> It's the preferred MC-layer representation, as I think we've discussed
> before. But beyond that, I think it's necessary longer-term…

Changing the MC representation should be orthogonal to parsing this alternative syntax. Having the value be an encoded pair (still a single immediate operand with bitfields for the components) seems a reasonable thing to do, but it’s not required for this.

> 
>> To me it seems more natural to solve an essentially syntactical problem with
>> a asm parsing only solution.
> 
> This problem has a disassembly layer manifestation too (which you'll
> almost certainly meet in one of MC Hammer's other modes):
> 
> $ echo [0x00,0x01,0xb0,0xe3] | bin/llvm-mc -disassemble -arch=arm
>        movs    r0, #0
> 
> (should be "movs r0, #0, #1" I think). I believe the closest extension
> of your proposed solution that can handle this would be adding a
> DecoderMethod to each instruction, which sets the MCInst's opcode to
> your AsmParserOnly variant if it detected an odd immediate.
> 
> That would be beyond horrific and into nightmare-inducing.
> 
>> The other thing is that I have doubts about the asm parser's ability to
>> support this, even via custom parsing methods. In the end what tablegen generates is a
>> parser in name only. It's not a real parser.
> 
> If I'm understanding you correctly, that's what comes out if you set
> just the Name method (the isXXX and addXXX methods). The ParserMethod
> gets called by TableGen as a short-circuit hook in normal parsing and
> has direct access to the token stream of the assembler
> ("AsmToken::Hash, AsmToken::Integer, AsmToken::Comma, ...").
> 
>> Regarding the duplication, this is a tablegen improvement opportunity. We
>> should  be able to factor out the definitions common to all members of a
>> multiclass; but alas this doesn't work.
> 
> That's really a side issue. We'd still be left with two Instruction
> instances representing the same instruction, which is not a good state
> to be in.

It’s more than that. It's a fundamental problem. We did a lot of work back in the original MC-ization of ARM to get rid of lots of those, and it’s a non-starter to go back the other direction.

Tim is completely correct. This problem should be handled completely in the AsmParser, not the instruction encoder.

-Jim
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20130729/e7fe45a0/attachment.html>