[llvm-dev] Printing PC-relative offsets - how to get the instruction length?

Oliver Stannard via llvm-dev llvm-dev at lists.llvm.org
Thu Mar 28 02:55:59 PDT 2019


Hi Mark,

I'd expect that to happen in two steps:

- A function in <target>MCCodeEmitter will convert the MCExpr operand into
  either the immediate which will be encoded in the instruction (for simple
  immediates), or add an MCFixup to the instruction (when the operand is a
  symbol or more complex expression). The function to be called is listed in
  the tablegen description of the operand, in your case it is "getMemOpValue".

- The MCAssembler tries to resolve the fixup, either resolving it entirely
  within the assembler, or emitting a relocation for it. The main loop for this
  is at the end of MCAssembler::layout, and it calls target-specific code in
  <target>AsmBackend to modify the encoded instructions, and
  <target><objectformat>ObjectWriter to emit relocations.

The reason for the two phases is that we won't know whether a fixup needs a
relocation or not until we have parsed the whole file. For example, it might
reference a symbol defined after the instruction in the source.

Oliver

> -----Original Message-----
> From: Mark R V Murray [mailto:mark at grondar.org]
> Sent: 28 March 2019 09:26
> To: Oliver Stannard
> Cc: llvm-dev at lists.llvm.org; nd
> Subject: Re: [llvm-dev] Printing PC-relative offsets - how to get the
> instruction length?
> 
> Hi Oliver,
> 
> Thanks! Both your answers got me on the right track!
> 
> Regarding the second, I'm now correctly parsing an immediate using an
> MCExpr if it is not an actual number. When does the MCExpr get resolved
> to an actual number? During assembly time? Or is it a Link/Fixup thing?
> 
> If I have a snippet of code like (e.g.):
> 
> foo	equ	12
> 	lda	foo,x
> 
> ... for a constant offset off the X index register. When and and by what
> will the foo get resolved to 12 for the LDA indstruction?
> 
> M
> 
> > On 27 Mar 2019, at 14:56, Oliver Stannard <Oliver.Stannard at arm.com>
> wrote:
> >
> > Hi Mark,
> >
> > For your first question, the MCInstPrinter has a reference to the
> MCInstrInfo
> > object for your target, so something like this should give you the
> instruction
> > encoding size in bytes:
> >
> >  MII.get(Op.getOpcode()).getSize()
> >
> > For your second question, it looks like the MCK_Imm8 operand class is
> matching
> > the immediate even when it is out of range. This should be checked by a
> > function in your assembly parser. The ImmediateAsmOperand<"Imm8">
> record (which
> > you didn't show the definition of, so I'm guessing a bit here) should
> have a
> > PredicateMethod value giving the name of that function. If that's not
> > specified, the default function name is based on the tablegen class
> name, which
> > won't be correct for both Imm8 and Imm16. Note that the ImmLeaf in the
> code
> > snippet you posted is only used for code generation from IR, not by the
> > assembler.
> >
> > Oliver
> >
> >> -----Original Message-----
> >> From: llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org] On Behalf Of
> Mark
> >> R V Murray via llvm-dev
> >> Sent: 25 March 2019 16:19
> >> To: llvm-dev at lists.llvm.org
> >> Subject: [llvm-dev] Printing PC-relative offsets - how to get the
> >> instruction length?
> >>
> >> Hi
> >>
> >> In my MC6809 backend, in
> >> llvm/lib/Target/MC6809/InstPrinter/MC6809InstPrinter.cpp, I have the
> >> routine
> >>
> >> void MC6809InstPrinter::printPCRelImmOperand(const MCInst *MI,
> unsigned
> >> OpNo, raw_ostream &O) {
> >>  const MCOperand &Op = MI->getOperand(OpNo);
> >> ZZ
> >>  if (Op.isImm()) {
> >>    int64_t Imm = Op.getImm() + 2;  <<<========================
> >>    O << "$";
> >>    if (Imm >= 0)
> >>      O << '+';
> >>    O << Imm;
> >>  } else {
> >>    assert(Op.isExpr() && "unknown pcrel immediate operand");
> >>    Op.getExpr()->print(O, &MAI);
> >>  }
> >> }
> >>
> >> Which works well enough except for the constant 2 that I've arrowed -
> it
> >> needs to be the length of the binary instruction in bytes. The MC6809
> has
> >> a *LOT* of variability here, so a case statement would be a right pain
> to
> >> maintain.
> >>
> >> An answer is tantalisingly close:
> >>
> >> $ bin/llvm-mc -triple mc6809 -show-inst-operands -show-inst -show-
> >> encoding <<< "lda 0,pc"
> >> 	.text
> >> <stdin>:1:1: note: parsed instruction: ['lda', 0, <register 13>]
> >> lda 0,pc
> >> ^
> >> 	lda	$+2,pc                  ; encoding: [0xa6,0x8c,0x00]
> >> <<===========
> >>                                        ; <MCInst #1849 LDAi8oPC
> >>                                        ;  <MCOperand Imm:0>
> >>                                        ;  <MCOperand Imm:0>>
> >>
> >> The "encoding:" knows that I have a three-byte instruction, but that
> is
> >> generated by another chunk of code miles away. I suppose I could
> >> replicate that, but it seems wasteful. Is there a better way, not
> >> involving nasty layering violations, to get the length of an
> instruction
> >> in bytes in the context of
> >> llvm/lib/Target/*/InstPrinter/*InstPrinter.cpp?
> >>
> >> Also, both 8 and 16-bit variants are possible. The instruction picked
> is
> >> LDAi8oPC with is the 8-bit offset version. If I supply a bigger
> offset:
> >>
> >> $ bin/llvm-mc -triple mc6809 -show-inst-operands -show-inst -show-
> >> encoding <<< "lda 1000,pc"
> >> 	.text
> >> <stdin>:1:1: note: parsed instruction: ['lda', 1000, <register 13>]
> >> lda 1000,pc
> >> ^
> >> 	lda	$+1002,pc               ; encoding: [0xa6,0x8c,0xe8]
> >>                                        ; <MCInst #1849 LDAi8oPC
> >>                                        ;  <MCOperand Imm:0>
> >>                                        ;  <MCOperand Imm:1000>>
> >>
> >> I still get the 8-bit variant instead of LDAi16oPC, and the operand is
> >> truncated.
> >>
> >> The TableGen-generated .inc file has
> >>
> >> { 444 /* lda */, MC6809::LDAi8oPC, Convert__imm_95_0__Imm81_0,
> >> AMFBS_None, { MCK_Imm8, MCK_PC }, },
> >> { 444 /* lda */, MC6809::LDAi16oPC, Convert__imm_95_0__Imm161_0,
> >> AMFBS_None, { MCK_Imm16, MCK_PC }, },
> >>
> >> ... so how do I get the 16-bit variant with MCK_Imm16 selected
> instead?
> >>
> >> The instructions are defined as
> >>
> >> def LDAi8oPC : MC6809LoadIndexed_i8oPC_P1<
> >>                (outs GR8:$dst8),
> >>                (ins pcoffset8:$offset),
> >>                !strconcat("lda", "\t", "${offset}", ",", "pc"),
> >>                0x00,
> >>                0xA6,
> >>                []
> >>> { let Inst{23-16} = offset{7-0}; let Inst{15} = 0b1; let Inst{14-13}
> =
> >> 0b00; let Inst{12-8} = 0b01100; let Inst{7-0} = opcode; }
> >>
> >> def LDAi16oPC : MC6809LoadIndexed_i16oPC_P1<
> >>                (outs GR8:$dst8),
> >>                (ins pcoffset16:$offset),
> >>                !strconcat("lda", "\t", "${offset}", ",", "pc"),
> >>                0x00,
> >>                0xA6,
> >>                []
> >>> { let Inst{31-24} = offset{7-0}; let Inst{23-16} = offset{15-8}; let
> >> Inst{15} = 0b1; let Inst{14-13} = 0b00; let Inst{12-8} = 0b01101; let
> >> Inst{7-0} = opcode; }
> >>
> >> and I have
> >>
> >> def pcoffset8 : Operand<i8>, ImmLeaf<i8, [{ return Immediate >= -128
> &&
> >> Immediate <= 127; }]> {
> >>  let PrintMethod = "printPCRelImmOperand";
> >>  let MIOperandInfo = (ops i8imm);
> >>  let ParserMatchClass = ImmediateAsmOperand<"Imm8">;
> >>  let EncoderMethod = "getMemOpValue";
> >>  let DecoderMethod = "DecodeMemOperand";
> >> }
> >>
> >> def pcoffset16 : Operand<i16>, ImmLeaf<i16, [{ return Immediate >= -
> 32768
> >> && Immediate <= 32767; }]> {
> >>  let PrintMethod = "printPCRelImmOperand";
> >>  let MIOperandInfo = (ops i16imm);
> >>  let ParserMatchClass = ImmediateAsmOperand<"Imm16">;
> >>  let EncoderMethod = "getMemOpValue";
> >>  let DecoderMethod = "DecodeMemOperand";
> >> }
> >>
> >> M
> >> --
> >> Mark R V Murray
> >>
> >> _______________________________________________
> >> LLVM Developers mailing list
> >> llvm-dev at lists.llvm.org
> >> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> 
> --
> Mark R V Murray



More information about the llvm-dev mailing list