[LLVMdev] Instruction descriptions question

Mon Oct 2 12:37:42 PDT 2006

Hi Chris,

Thanks a lot for your answer!

Chris Lattner wrote:
>> 1. Why does X86 instruction set description provide different
>> descriptions for the same instructions, which differ only in the
size
>> of operands?
>> E.g.
>>
>> def MOV8rm  : I<0x8A, MRMSrcMem, (ops GR8 :$dst, i8mem :$src),
>>                "mov{b} {$src, $dst|$dst, $src}",
>>                [(set GR8:$dst, (load addr:$src))]>;
>> def MOV16rm : I<0x8B, MRMSrcMem, (ops GR16:$dst, i16mem:$src),
>>                "mov{w} {$src, $dst|$dst, $src}",
>>                [(set GR16:$dst, (load addr:$src))]>, OpSize;
>> def MOV32rm : I<0x8B, MRMSrcMem, (ops GR32:$dst, i32mem:$src),
>>                "mov{l} {$src, $dst|$dst, $src}",
>>                [(set GR32:$dst, (load addr:$src))]>;
> 
> We must do this, because they perform different operations. 
Specifically, 
> the loads all read a different number of bytes (1/2/4).

OK.

>> For my target processor, only the types of operands (e.g. rm) are
>> really important. The size is encoded into the opcode, but otherwise
it
>> does not affect anything except for constraints on the sizes of
>> operands. And constraint is basically that the size of both operands
>> should be the same.
> 
> Ok.
> 
>> Wouldn't it be possible and even more clean to have just one
>> description like (I use a pseudo-description here):
>>
>> def MOVrr  : I<0x88, MRMDestReg, (ops (GR8|GR16|GR32) :$dst,
>> (i8mem|i16mem|i32mem):$src),
>>                "mov{b} {$src, $dst|$dst, $src}", []>,
isSameSize($dst,
>> $src);
> 
> We already have something like this, but it's a little more general. 
The 
> X86 backend hasn't been converted to use it. 

Any plans to do it, to make things simpler?

> This is the 'multiclass' 
> facility in tblgen:
> http://llvm.org/docs/TableGenFundamentals.html#multiclass

This is very interesting.

> Basically this lets you use one definition to implement multiple
different 
> instructions.  For example, most instructions in the sparc target
come in 
> "reg,reg" and "reg,imm" forms.  As such, it defines:
> 
> multiclass F3_12<string OpcStr, bits<6> Op3Val, SDNode OpNode> {
>    def rr  : F3_1<2, Op3Val,
>                   (ops IntRegs:$dst, IntRegs:$b, IntRegs:$c),
>                   !strconcat(OpcStr, " $b, $c, $dst"),
>                   [(set IntRegs:$dst, (OpNode IntRegs:$b,
IntRegs:$c))]>;
>    def ri  : F3_2<2, Op3Val,
>                   (ops IntRegs:$dst, IntRegs:$b, i32imm:$c),
>                   !strconcat(OpcStr, " $b, $c, $dst"),
>                   [(set IntRegs:$dst, (OpNode IntRegs:$b,
simm13:$c))]>;
> }
> 
> which allows it to use instructions like:
> 
> defm AND    : F3_12<"and"  , 0b000001, and>;
> defm OR     : F3_12<"or"   , 0b000010, or>;
> defm XOR    : F3_12<"xor"  , 0b000011, xor>;
> defm SLL    : F3_12<"sll"  , 0b100101, shl>;
> defm SRL    : F3_12<"srl"  , 0b100110, srl>;
> defm SRA    : F3_12<"sra"  , 0b100111, sra>;
> defm ADD    : F3_12<"add"  , 0b000000, add>;
> defm ADDCC  : F3_12<"addcc", 0b010000, addc>;
> defm ADDX   : F3_12<"addx" , 0b001000, adde>;
> defm SUB    : F3_12<"sub"  , 0b000100, sub>;
> defm SUBX   : F3_12<"subx" , 0b001100, sube>;
> defm SUBCC  : F3_12<"subcc", 0b010100, SPcmpicc>;
> ...
> 
> Each of these 'defm's expand into two instructions.
> 
>> The semantic of such a description would mean that $dst should be
one
>> of GR8, GR16, GR32 and $dst is one of i8mem, i16mem, i32mem with the
>> additional constraint that the sizes of both operands are the same
>> (this is checked by isSameSize predicate).
> 
> Sure.  It would be straight-forward to define a multiclass for your
arch, 
> in which each use of defm makes three instructions: one for each
width.

Beautiful! This saved my day! Multiclass is an extremely useful
construct. Thanks a lot for explanations. Multiclasses seem to provide
almost everything (if not all) of the features that I was asking for.
I'll try to use it for writing a short and concise definition of
instructions for my target. 

>> 2. Another related question is about instruciton costs. In
BURG-based
>> selectors there is usually a possibility to describe costs for the
>> instructions so that a least-cost cover can be found during the
>> instruction selection process. I have not seen any such cost
>> descriptions in TableGen files. Does it mean that it is not
supported?
> 
> LLVM currently uses assumes that all instructions have cost 1, unless
told 
> otherwise ('Pat' patterns which produce multiple instructions have
cost 
> equal to the sum of their result).  Given this, it attempts to match
the 
> largest patterns possible, which reduces cost.
> 
> The 'largest' metric is somewhat intricate, but you can directly
affect it 
> in your .td files with the 'AddedComplexity'.  If you increase it, it
will 
> make an instruction more likely to be matched.  However, you really 
> shouldn't need to do this, except in extreme cases.  Are you seeing
cases 
> where the selector is missing an optimal match, or are you just used
to 
> other tools which aren't as smart at inferring cost as tblgen?

I guess, I'm just used to other more BURG-like tools. And yes, they are
not too intelligent compared to tblgen;) 

So far, I haven't seen a real case where an optimal match is missed by
tblgen. But I have not completely converted the old BURG-spec into
tblgen yet. When I'm ready, I'll try to compare the results and check
some corner cases (for example, I was using tree grammar to
automatically select a best addressing mode, which tblgen doesn't
support yet as far as I understand from the documentation; there were
also interesting tree patterns for C operations of the form X OP= Y and
increment operators) - let's see how tblgen will perform.

>> Why? As far as I understand, LLVM uses BURG/IBURG for instruction
>> selection and both of these tools support costs, AFAIK.
> 
> Nope, it uses neither.  It uses custom code built into tblgen.

OK. Now I understand. Probably I thought that tblgen is BURG-based
because there is a Burg subdirectory in the llvm/utils. And in older
versions of llvm it was even non-empty, giving the impression that it
is used.

BTW, does it use the same theory as BURG/IBURG when it generates the
selector? I.e. dynamic programming + precomputed costs,etc?  Does it 
implicitly build a tree grammar or something like that? Does it try to
merge/share patterns as much as possible to speedup the recognition
phase as it is done by BURMs? I have the impression (by looking at the
produced C++ selector code) that tblgen-based selectors are not as
"precomputed" and optimal as BURG-based ones. Do you have an idea or
figures about how it compares to BURG/IBURG selectors?

Thanks again,
  Roman

__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com