[LLVMdev] Backend: 2 address + 17bit immediate

Thu Mar 22 07:38:14 PDT 2007

Hello,
Im (trying) to write a backend for a simple 32bit processor architecture, 
with a single instruction format having no condition code registers.
www.docm.mmu.ac.uk/STAFF/A.Nisbet/Sabre.pdf  is the short 15 page document 
describing the architecture of Sabre. It is a Celoxica developed 
research/teaching processor, pages 5-8 contain relevant information for 
targetting it from a new compiler backend, i,e, it is trivially simple with 
25 actual instructions. Typo on page 5, operand A is clearly bits 9-5.

The general form for instructions is:--

opcode %a, %b, 17bit signed immediate.

%b is a source register.
%a is typically the source and the destination register for the operation, 
ie %a = operation %a,%b, immediate.
%b and the immediate act like a virtual operand c that is the sum 
of  register b's contents and the immediate value.
%b can be omitted if it refers to the "zero valued register %0".
The immediate can be omitted if it has a zero value.
The exceptions to this are the various forms of conditional branch 
instructions that must compare the contents of 2 registers and specify a 
branch target address using the immediate, (textually the immediate is a 
label, in machine code the immediate is a relative offset for the PC).

I have spent some time looking at the PPC and SPARC backends, but obviously 
these are much more complicated than what I require to implement. 
Consequently, I am not correctly grasping the interactions between 
ARCHInstrInfo.td and ARCHDAGToDAGISel.cpp I did manage to hack something 
together based on a copy of SPARC (with a SABRE namespace etc) but the 
instruction selection was incorrect and I obtained a "Cannot yet 
select:0x..." assertion failure from SABREDAGToDAGIsel::SelectCode when I 
attempted a
llc -march sabre helloworld.bc -o helloworld.s

Can anyone offer any guidance on how to proceed with debugging instruction 
selection issues? Or perhaps some description of how the pattern matching 
and the instruction selection works with a verbose explanation for a single 
instruction (this would probably be more beneficial), relating the 
Processor instruction set to the LLVM supported instruction set and the 
actual code generation/printing.

WRT defining the instructions themselves: am I right in thinking that it is 
sensible (for instruction selection) to represent the instruction set as a 
collection of instructions targetting register register and register 
immediate, so for example I would create defs for
ADDrr to match ADD %a,%b
ADDri to match ADD %a, immediate
I have used multiclass to achieve this. Previously I was attempting to 
match the opcode %a,%b,immediate general form.

Clearly I also need a way to load a 32 bit constant value into a register 
in order to be able to address  more than 64K of memory. I know the PPC 
does something similar ...

So for example for SABRE  this instruction output would perform the 
necessary ...
MOVri %a, HI16(32 bit constant)
LSHri %a,16
ORri %a, LO16(same 32 bit constant)
LD %d, %a // ie load the contents of the memory at the address stored in %a 
into register %d

where the HI/LO16 are performed at code generation by LLVM. I'm a little 
confused as to how to specify this as a pattern in tablegen syntax, even 
with the PPC example.

Apologies for the naivety of these questions.

Thanks,
         Andy

      Dr. Andy Nisbet: URL http://www.docm.mmu.ac.uk/STAFF/A.Nisbet
Department of Computing and Mathematics, John Dalton Building, Manchester
        Metropolitan University, Chester Street, Manchester M1 5GD, UK.
Email: A.Nisbet at mmu.ac.uk, Phone:(+44)-161-247-1556; Fax:(+44)-161-247-1483.

"Before acting on this email or opening any attachments you
should read the Manchester Metropolitan University's email
disclaimer available on its website
http://www.mmu.ac.uk/emaildisclaimer "