[llvm-dev] Getting up to speed with llvm backends. Machine Instruction operands.

Thu Mar 12 09:05:47 PDT 2020

Walter Zambotti via llvm-dev <llvm-dev at lists.llvm.org> writes:

> I have also cloned llvm 9.0.1 and started looking at some of the targets.  A
> little overwhelming!

I would work off master if possible.  LLVM APIs are a moving target.

> The operands of a machine instruction can be of several different types: a
> register reference, a constant integer, a basic block reference, etc.
>
> Where are these operand types defined or documented (especially the etcs)?

Look in include/llvm/CodeGen/MachineOperand.h

> How do these operand types relate to the operands specified in the
> instruction selection and selection patterns?

That is entirely up to the target to define.  You write a combination of
DAG patterns to match (using TableGen) and custom C++ lowering code to
map SelectionDAG nodes to MachineInstrs (which reference
MachineOperands).  Most of the time these are one-to-one mappings (a
SelectionDAG node maps to an instruction defining a virtual register, a
SelectionDAG constant maps to a MachineOperand constant, SelectionDAG
operands map to virtual register references, etc.)

> As mentioned above, LLVM’s agnosticism regarding pointers initially
> makes it impossible to comply with the EABI as there is no way to tell
> whether an integer argument should go into an address register or a
> data register.
>
> However this document is dated circa 2008/2009 and I ask if this situation
> still remains the same today.

Yes.  There is no machine-level "pointer type."

> I ask because the backend I would like to target the Hitachi/Motorola
> 6309/6809 which too provides dedicated indexing (addressing)
> registers. In fact in all binary operations the second operand is
> either immediate or some kind of a memory reference via a
> index/address register.

I'm not familiar with this architecture, but I will try to answer
question as best I can.

> So given the machine instruction :
>
>  
>
>                add d ,x # to the d register add what the x register
> points at
>
> further examples of the second arguement are:
>
>  
>
> ,x+    # what register x points to and post inc x   ie. *x++
>
> 10,y   # what register y + 10 pointer to            ie. *(y+10)
>
> [20,u] # what register u + 20 pointer to pointer to ie. **(u+20)
>
> w,y    # what register y + register w points to     ie. *(y+w)
>
> Is there a way to pattern match these kinds of operands?

Yes.  The AArch64 backend might be a good guide as it supports pre- and
post-increment.  I don't know if any existing target has
pointer-to-pointer operands (that's kind of a strange thing as it
requires two memory operands in one instruction) but I don't think it
would be super difficult to add.  The X86 backend has special matching
code to construct its more complex addressing modes.

> In MachineOperand.h I see this operand type.  I assume I can match to
> it?!?!?
>
>     MO_TargetIndex,       ///< Target-dependent index+offset operand.

I think the interpretation of this operand type is up to the target
(hence, "Target-dependent").  So yes, I think you could use it.

> The x86 has a very flexible way of accessing memory. It is capable of
> forming memory addresses of the following 
>
> I then went and looked at the files in target/x86 and I have to admit
> I got lost trying to find where and how this is implemented.

For X86, the magic happens with the "addr" TableGen class in
X86InstrInfo.td which is used to type address operands of Load/Store
nodes in the DAG patterns.  This ends up calling info custom matching
code "selectAddr" which lives in X86ISelDAGToDAG.cpp.  There are similar
variants of "addr" in the TableGen files each with different
requirements (for example alignment restrictions for vector
instructions).

Look at the instruction patterns in X86InstrArithmetic.td to see how
"addr" is used.

Hopefully that will get you started.

> At this (learning) stage I would appreciate any input or pointers
> including any other documentation or tutorials that might help in
> relation to how I can implement indexed memory addressing operands.

For backend work it is absolutely necessary to understand TableGen.

http://llvm.org/docs/TableGen/index.html
http://llvm.org/docs/TableGen/LangIntro.html
http://llvm.org/docs/TableGen/LangRef.html
http://llvm.org/docs/TableGen/BackEnds.html

The way existing code generators work will make much more sense after
reading at least the first three above.  The fourth is mostly brief
summaries of the different things TableGen is used to generate.  It
started out mostly as a code generator generator but has become much
more over the years.

                       -David