[LLVMdev] What does MCOperand model?

Greg Fitzgerald garious at gmail.com
Wed Sep 26 18:36:17 PDT 2012


> the MCOperand should not contain values that cannot be encoded

In the case of pre-encoding a shifted immediate, we acknowledge that
we've only moved the invalid encodings from ones that set the bottom
bit to ones that set the top?

Is there a backend that is implemented in this style I can use as a reference?


> The MCOperand values are intended to model the instruction
> encoding. Where that doesn't match the assembly syntax,
> the asm parser (and codegen) and the instruction printer are
> responsible for encoding/decoding the values.

As my colleague and I try to implement instructions in the recommended
style, we are finding it to be harder with the constraint of MCOperand
needing to be pre-encoded.  I've attached a diagram of my
understanding of the code flow if using the recommended style versus
using the MCOperand to model the syntax.  How far off am I?

[See attached]

In the diagram with pre-encoding, a shared function EncodeImm() has to
be referenced from 3 locations.  As a newcomer to LLVM going after a
simple encoding bug, I wasn't expecting to have to grok every client
of MCOperand just to fix how it is encoded.  To pre-encode, it seems
the .td file needs to use a custom operand that inherits from a
generic one for the only purpose of routing to the shared encoding
function.  Is there better alternative for getting from the LLVM
target-independent IR to the pre-encoded MCOperand?

Thanks,
Greg


On Wed, Sep 26, 2012 at 2:02 PM, Jim Grosbach <grosbach at apple.com> wrote:
> Owen is correct in his descriptions. The MCOperand values are intended to model the instruction encoding. Where that doesn't match the assembly syntax, the asm parser (and codegen) and the instruction printer are responsible for encoding/decoding the values.
>
> For targets that predate the MC layer, this isn't always the case, leading to things being a bit confusing when just reading the code. Any new targets should absolutely consider the instruction encoding to be the canonical representation and map assembly syntax onto that, not the other way around.
>
> Regards,
> -Jim
> On Sep 26, 2012, at 11:26 AM, Greg Fitzgerald <garious at gmail.com> wrote:
>
>> A question for LLVM code generator developers:
>>
>> After having read through "The LLVM Target-Independent Code Generator"
>> [1] I'm unclear about what precisely the objects MCInst and MCOperand
>> represent.  They sit in the space between assembly syntax and binary
>> encodings, but which are they modeling?  For example, a Thumb 2 branch
>> instruction 'b' takes an immediate.  That syntax "b #1234" can map to
>> a couple different encodings.  If it is an even number between -2048
>> and 2046, it can be encoded with a 16-bit instruction, otherwise a
>> 32-bit instruction.  If the MC objects are to model the syntax, then
>> one would expect both encodings to have identical values in the
>> MCOperand, a 32-bit signed integer.  On the other hand, if MC objects
>> are to model the encoding, one would expect the MCOperand for the
>> 16-bit encoding to contain a number between -1024 and 1023.  Which one
>> is it?
>>
>> My intuition says the MCOperand should model the assembly syntax and
>> contain the 32-bit signed integer, and that the EncoderMethod and
>> DecoderMethod are responsible for mapping that high-level number to
>> the low-level binary representation.  If, however, the MCOperand
>> models the encoding, then EncoderMethod and DecoderMethod glue need
>> not exist, and that bit-twiddling logic would be pushed to whoever
>> creates the MCOperand.
>>
>> Looking at the Thumb backend, I believe it has been written assuming
>> the MC objects model the syntax, not the encoding, which matches my
>> intuition.  There has been some discussion on the llvm-commits list
>> encouraging us to store the encoded value in the MCOperand.  The
>> justification, as I understand it, is that the MCOperand should not
>> contain values that cannot be encoded.  This effectively means that
>> the MCOperands would be modeling the binary encoding, not the syntax.
>> Are folks making this transition in other backends as well?
>>
>> [1] http://llvm.org/docs/CodeGenerator.html
>>
>> Thanks,
>> Greg
>> _______________________________________________
>> LLVM Developers mailing list
>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20120926/21292fa6/attachment.html>


More information about the llvm-dev mailing list