[llvm-dev] Getting up to speed with llvm backends. Machine Instruction operands.

Walter Zambotti via llvm-dev llvm-dev at lists.llvm.org
Thu Mar 12 00:22:21 PDT 2020

Welcome to all


Questions from veteran programmer with no LLVM backend experience evaluating

llvm for creating a Hitachi 6309 backend.


This post is about finding out more about machine instruction operands.


The documentation I have read so far includes:


- the online manuals

- Building an LLVM Backend. Fraser Cormack Pierre-André Saulais

- The Design of a Custom 32-bit RISC CPU and LLVM Compiler Backend. Connor
Jan Goldberg

- Design and Implementation of a TriCore Backend for the LLVM Compiler
Framework. Christoph Erhardt


I have also cloned llvm 9.0.1 and started looking at some of the targets.  A
little overwhelming!


At this point I'm at information overload!


>From the "The LLVM Target-Independent Code Generator"


The MachineInstr class


The operands of a machine instruction can be of several different types: a
register reference, a constant integer, a basic block reference, etc.


Where are these operand types defined or documented (especially the etcs)?


How do these operand types relate to the operands specified in the
instruction selection and selection patterns?


A concern I have is raised in "Design and Implementation of a TriCore
Backend for the LLVM Compiler" where

the instruction set is non orthogonal (contains special purpose address


The strict distinction between pointers and integers is highly problematic
because LLVM’s

code generator implicitly converts all pointers to integers of the same
width ... upon 

construction of the SelectionDAG.




As mentioned above, LLVM’s agnosticism regarding pointers initially makes it

sible to comply with the EABI as there is no way to tell whether an integer

should go into an address register or a data register.


However this document is dated circa 2008/2009 and I ask if this situation
still remains the same



I ask because the backend I would like to target the Hitachi/Motorola
6309/6809 which too

provides dedicated indexing (addressing) registers. In fact in all binary
operations the second

operand is either immediate or some kind of a memory reference via a
index/address register.


The syntax being:


               {[}{OffsetReg | Disp{5,8,16}},{- | --}IndexReg{+ | ++ | ]}


OffsetReg can be 8bit or 16bit accumulator (so only certain regs allowed)

Displacment can be 5, 8 or 16 bit signed

IndexReg can only be special index registers or PC or stack

+ ++ is post increment by 1, 2 repsectively

- -- is pre decrement by 1, 2 respectively

[ ] the entire effective address is a pointer to pointer

[] and any incrementors/decrementors are mutally exclusive


So given the machine instruction :


               add d ,x  # to the d register add what the x register points


further examples of the second arguement are:


,x+    # what register x points to and post inc x   ie. *x++

10,y   # what register y + 10 pointer to            ie. *(y+10)

[20,u] # what register u + 20 pointer to pointer to ie. **(u+20)

w,y    # what register y + register w points to     ie. *(y+w)


Is there a way to pattern match these kinds of operands?


In MachineOperand.h I see this operand type.  I assume I can match to


    MO_TargetIndex,       ///< Target-dependent index+offset operand.


At https://llvm.org/docs/CodeGenerator.html#x86-addressing-mode


The x86 has a very flexible way of accessing memory. It is capable of
forming memory addresses of the following 

expression directly in integer instructions (which use ModR/M addressing):


SegmentReg: Base + [1,2,4,8] * IndexReg + Disp32


In order to represent this, LLVM tracks no less than 5 operands for each
memory operand of this form. This means 

that the “load” form of ‘mov’ has the following MachineOperands in this


Index:        0     |    1        2       3           4          5

Meaning:   DestReg, | BaseReg,  Scale, IndexReg, Displacement Segment

OperandTy: VirtReg, | VirtReg, UnsImm, VirtReg,   SignExtImm  PhysReg


Stores, and all other instructions, treat the four memory operands in the
same way and in the same order. If the 

segment register is unspecified (regno = 0), then no segment override is
generated. “Lea” operations do not have 

a segment register specified, so they only have 4 operands for their memory


I then went and looked at the files in target/x86 and I have to admit I got
lost trying to find where and

how this is implemented.


At this (learning) stage I would appreciate any input or pointers including
any other documentation or

tutorials that might help in relation to how I can implement indexed memory
addressing operands.


So appreciate comments.





-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200312/7816f30e/attachment.html>

More information about the llvm-dev mailing list