[LLVMdev] Instruction descriptions question

Sun Oct 1 05:06:12 PDT 2006

Hi,

I'm trying to implement a new backend for an embedded CISC processor.
Therefore I thought that it makes sense to take X86 target as a basis,
to save some time. 

But when I look into the X86InstrInfo.td, I have a very strong feeling
that it is one of the most complex instruction set descriptions
compared to other targets. I can imagine that this is due to the
complexity of X86's instruction set. But still I have a feeling that it
is overcomplicated, but may be it is just an initial impression.

Since I just started, here are my first several questions and
proposals. I hope that some of them make sense for you.

1. Why does X86 instruction set description provide different
descriptions for the same instructions, which differ only in the size
of operands?
E.g.

def MOV8rm  : I<0x8A, MRMSrcMem, (ops GR8 :$dst, i8mem :$src),
                "mov{b} {$src, $dst|$dst, $src}",
                [(set GR8:$dst, (load addr:$src))]>;
def MOV16rm : I<0x8B, MRMSrcMem, (ops GR16:$dst, i16mem:$src),
                "mov{w} {$src, $dst|$dst, $src}",
                [(set GR16:$dst, (load addr:$src))]>, OpSize;
def MOV32rm : I<0x8B, MRMSrcMem, (ops GR32:$dst, i32mem:$src),
                "mov{l} {$src, $dst|$dst, $src}",
                [(set GR32:$dst, (load addr:$src))]>;

For my target processor, only the types of operands (e.g. rm) are
really important. The size is encoded into the opcode, but otherwise it
does not affect anything except for constraints on the sizes of
operands. And constraint is basically that the size of both operands
should be the same.

Wouldn't it be possible and even more clean to have just one
description like (I use a pseudo-description here):

def MOVrr  : I<0x88, MRMDestReg, (ops (GR8|GR16|GR32) :$dst, 
(i8mem|i16mem|i32mem):$src),
                "mov{b} {$src, $dst|$dst, $src}", []>, isSameSize($dst,
$src);

The semantic of such a description would mean that $dst should be one
of GR8, GR16, GR32 and $dst is one of i8mem, i16mem, i32mem with the
additional constraint that the sizes of both operands are the same
(this is checked by isSameSize predicate). 

More over, in very many cases, the operation (e.g. OR) itself affects
only the encoding of the operation inside the opcode and of course the
description of the effect and assembler syntax e.g. [(store (or (load
addr:$dst), GR8:$src), addr:$dst)]. For OR insn the opreation should be
or, and for AND insn - obviously and. But operands and operand
combinations have the same limitations and descriptions also for many
other operations, e.g. XOR, AND, etc.
Wouldn't it make sense to provide an easier way to describe it?
For example, one would describe operands combinations (e.g. RR, RM)
separately and then just say which instructions they are applicable
for? May be using something like:

def rm : (GR32 :$dst, i32mem:$src)
{
  "$assm_op{l} {$src, $dst|$dst, $src}",
  [(set GR32:$dst, ($insn_effect_op GR16:$dst, (load addr:$src)))]     

};

Mapping from instruction name to instruction syntax and/or effect can
be also simplified to something like:

def AND -> assm_op="and", insn_effect_op="and";
def OR -> assm_op"or", insn_effect_op="or";

let rm in {
  def OR    : ...;
  def AND   : ...;
  def MOV   : ...;
  ...
}

The effect would be a set of instructions descriptions with names 
ORrm, ANDrm, MOVrm and required type of operands. Effectively, it
produces the same descriptions that would be written by hand (in this
regard it works pretty much as a preprocessor that expands macro
definitions). The advantage is that descriptions become much shorter
and cleaner, because parts common for multiple instructions are
factored out into a single description element.

I ask it because I have used BURG based instruction selectors (e.g. the
one from LCC compiler and some others) before and there is was quite
easy to express something like that (though not everything described
above) in the tree grammar for a code selector,e.g.
 gpr  -> GR8 | GR16| GR32;
 mem  -> i8mem | i16mem | i32mem;  
 insn -> mov(gpr:src, mem:dst) | operand_size_of(src) ==
operand_size_of(src)

Is it possible to express something like that using TableGen
descriptions? Or are they so limited at the moment that it is not
possible and this is the reason for so elaborative descriptions for the
X86 target. If it is not possible yet, are there any plans to implement
something like that for providing the possibilities for even cleaner,
shorter and more expressive and concise descriptions? Of course, all
extensions the I described are just sketches, but what do you think
about this approach as such?

2. Another related question is about instruciton costs. In BURG-based
selectors there is usually a possibility to describe costs for the
instructions so that a least-cost cover can be found during the
instruction selection process. I have not seen any such cost
descriptions in TableGen files. Does it mean that it is not supported?
Why? As far as I understand, LLVM uses BURG/IBURG for instruction
selection and both of these tools support costs, AFAIK.

Best Regards,
  Roman

__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com