[LLVMdev] How to partition registers into different RegisterClass?

Mon Jul 25 22:27:42 PDT 2005

On Mon, 25 Jul 2005, Tzu-Chien Chiu wrote:
> But please allow me to explain the hardware in detail. Hope there is
> more elegant way to solve it.

Sounds good!

> The hardware is a "stream processor". That is, It processes samples
> one by one. Each sample is associated with several 128-bit
> four-element vector registers, namely:
>
> * input registers - the attributes of the sample, the values of the
> registers are different and initialized for each sample before
> execution. READ-ONLY (can only be declared once by 'dcl' instruction).

Ok.

> * constant registers - sample-invariant. READ-ONLY (can only be
> defined once by 'def' instruction). All samples shares the same set of
> constant register values.

Ok.  I don't think the definition of these should be represented in your 
code.  The code should just read them when needed.

> * general purpose registers - values are not initialized before the
> execution and destroyed after execution. They can be read and written.

Yup, these should be register allocated.

> * output registers - WRITE-ONLY.

And these should be explicitly defined once, also not register allocated.

> Sample program converted to pseudo-LLVM assembly (SSA):
>
>  %Vec4 = type < 4 x float>
>
>  // declare input registers and
>  // define constant register values
>  %v1 = dcl %Vec4 xyz
>  %v2 = dcl %Vec4 color
>  %c1 = def %Vec4 <1,2,3,4>
>
>  // v1, v2, c1 are not allowed to be destination register
>  // of any instruction hereafter.
>
>  %r1 = add %Vec4 v1, c1
>  %r2 = mul %Vec4 v1, c2
>  %o1 = mul %Vec4 r2, v2     // write the output register 'o1'

Here, the v1/v2/c1/o1 registers should be represented as explicit 
registers, and the GPRs should be virtual registers.  This would give you 
code that looks something like this:

%reg1024 = add v1, c1
%reg1025 = mul v1, c2
%reg1026 = mul %reg1024, %v2
%o1 = mov %reg1026

The 'mov' register-to-register copy instruction will be coallesced and 
eliminated by the register allocator.  The regalloc will eliminate the 
virtual registers, assigning physical GPRs.  This is what the 'allocation 
order' is to cover.

> I planed to partition the register into different RegisterClass:
> input, output, general purpose, constant, etc.
>
>  def GeneralPurposeRC : RegisterClass<packed, 128, [R0, R1]>;
>  def InputRC : RegisterClass<packed, 128, [V0, V1]>;
>  def ConstantRC : RegisterClass<packed, 128, [C0, C1]>;

The way you want to partition these is based on how the instruction set 
works.  If there is a single 'add' instruction that can operate on any of 
these registers, there should be a single register class.  If there are 
two adds (as it looks like you have below, judging by the opcode) with 
different register constraints, then you should partition the registers so 
that each the register classes line up with the instruction operand 
requirements.

> def ADDgg : BinaryInst<0x51, (
>   ops GeneralPurposeRC :$dest,
>   ope GeneralPurposeRC :$src), "add $dest, $src">;
>
> def ADDgi : BinaryInst<0x52, (
>   ops GeneralPurposeRC :$dest,
>   ope InputRC :$src), "add $dest, $src">;
>
> def ADDgc : BinaryInst<0x52, (
>   ops GeneralPurposeRC :$dest,
>   ope ConstantRC :$src), "add $dest, $src">;
>
> The problem is: SDOperand alwasy return the 'type' of the value (in
> this case, 'packed', the first argument of RegisterClass<>), but not
> the 'RegisterClass'. With two 'packed' operands, the instruction
> selector doesn't know whether a ADDgg, ADDgi, or an ADDgc should be
> generated (BuildMI() function).

Right.  You don't want to do this sort of partitioning.  All of the 
'computed' values should be virtual registers which will end up being 
assigned to GPRs.  The register allocator will attempt to coallesce the 
GPR into an output or input register if possible.  To allow this 
coallescing to happen, implement the TargetInstrInfo::isMoveInstr virtual 
method for your target.

> The same problem exists when there are two types of costant registers,
> floating point and integer, and each is declared 'packed' ([4xfloat]
> and [4xint]). The instruction selector doesn't know which instruction
> it should produce because the newly defined MVT type 'packed' is
> always used for all operands (registers), even if it's acutally a
> [4xfloat] or [4xint].

It might make sense to add two MVT enums: one for packed integers, and one 
for packed floats?

-Chris

> 2005/7/24, Chris Lattner <sabre at nondot.org>:
>> On Sat, 23 Jul 2005, Tzu-Chien Chiu wrote:
>>> 2005/7/23, Chris Lattner <sabre at nondot.org>:
>>>> What does a 'read only' register mean?  Is it a constant (e.g. returns
>>>> 1.0)?  Otherwise, how can it be a useful value?
>>>
>>> Yes, it's a constant register.
>>>
>>> Because the instruction cannot contain an immediate value, a constant
>>> value may be stored in a constant register, and it's defined _before_
>>> the program starts by API. For example:
>>>
>>>  SetConstantValue( 5, Vector4( 1, 2, 3, 4 ); // C5 = <1,2,3,4>
>>>  HANDLE handle = LoadCodeFromFile( filename );
>>>  SetCode( handle );  // C5 is referenced here
>>>  Execute();
>>
>> Ah, ok. In that case, you want to put all of the registers in one register
>> file, and not make the constant register allocatable (e.g. see
>> X86RegisterInfo.td, and note how the register classes include EBP and ESP,
>> but do not register allocate them (through the definition of
>> allocation_order_end()).
>>
>> -Chris
>>
>> --
>> http://nondot.org/sabre/
>> http://llvm.org/
>>
>
>
>

-Chris

-- 
http://nondot.org/sabre/
http://llvm.org/