[LLVMdev] What does MCOperand model?

Wed Sep 26 14:02:38 PDT 2012

Owen is correct in his descriptions. The MCOperand values are intended to model the instruction encoding. Where that doesn't match the assembly syntax, the asm parser (and codegen) and the instruction printer are responsible for encoding/decoding the values.

For targets that predate the MC layer, this isn't always the case, leading to things being a bit confusing when just reading the code. Any new targets should absolutely consider the instruction encoding to be the canonical representation and map assembly syntax onto that, not the other way around.

Regards,
-Jim
On Sep 26, 2012, at 11:26 AM, Greg Fitzgerald <garious at gmail.com> wrote:

> A question for LLVM code generator developers:
> 
> After having read through "The LLVM Target-Independent Code Generator"
> [1] I'm unclear about what precisely the objects MCInst and MCOperand
> represent.  They sit in the space between assembly syntax and binary
> encodings, but which are they modeling?  For example, a Thumb 2 branch
> instruction 'b' takes an immediate.  That syntax "b #1234" can map to
> a couple different encodings.  If it is an even number between -2048
> and 2046, it can be encoded with a 16-bit instruction, otherwise a
> 32-bit instruction.  If the MC objects are to model the syntax, then
> one would expect both encodings to have identical values in the
> MCOperand, a 32-bit signed integer.  On the other hand, if MC objects
> are to model the encoding, one would expect the MCOperand for the
> 16-bit encoding to contain a number between -1024 and 1023.  Which one
> is it?
> 
> My intuition says the MCOperand should model the assembly syntax and
> contain the 32-bit signed integer, and that the EncoderMethod and
> DecoderMethod are responsible for mapping that high-level number to
> the low-level binary representation.  If, however, the MCOperand
> models the encoding, then EncoderMethod and DecoderMethod glue need
> not exist, and that bit-twiddling logic would be pushed to whoever
> creates the MCOperand.
> 
> Looking at the Thumb backend, I believe it has been written assuming
> the MC objects model the syntax, not the encoding, which matches my
> intuition.  There has been some discussion on the llvm-commits list
> encouraging us to store the encoded value in the MCOperand.  The
> justification, as I understand it, is that the MCOperand should not
> contain values that cannot be encoded.  This effectively means that
> the MCOperands would be modeling the binary encoding, not the syntax.
> Are folks making this transition in other backends as well?
> 
> [1] http://llvm.org/docs/CodeGenerator.html
> 
> Thanks,
> Greg
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev