[LLVMdev] How to partition registers into different RegisterClass?
Chris Lattner
sabre at nondot.org
Mon Jul 25 22:27:42 PDT 2005
On Mon, 25 Jul 2005, Tzu-Chien Chiu wrote:
> But please allow me to explain the hardware in detail. Hope there is
> more elegant way to solve it.
Sounds good!
> The hardware is a "stream processor". That is, It processes samples
> one by one. Each sample is associated with several 128-bit
> four-element vector registers, namely:
>
> * input registers - the attributes of the sample, the values of the
> registers are different and initialized for each sample before
> execution. READ-ONLY (can only be declared once by 'dcl' instruction).
Ok.
> * constant registers - sample-invariant. READ-ONLY (can only be
> defined once by 'def' instruction). All samples shares the same set of
> constant register values.
Ok. I don't think the definition of these should be represented in your
code. The code should just read them when needed.
> * general purpose registers - values are not initialized before the
> execution and destroyed after execution. They can be read and written.
Yup, these should be register allocated.
> * output registers - WRITE-ONLY.
And these should be explicitly defined once, also not register allocated.
> Sample program converted to pseudo-LLVM assembly (SSA):
>
> %Vec4 = type < 4 x float>
>
> // declare input registers and
> // define constant register values
> %v1 = dcl %Vec4 xyz
> %v2 = dcl %Vec4 color
> %c1 = def %Vec4 <1,2,3,4>
>
> // v1, v2, c1 are not allowed to be destination register
> // of any instruction hereafter.
>
> %r1 = add %Vec4 v1, c1
> %r2 = mul %Vec4 v1, c2
> %o1 = mul %Vec4 r2, v2 // write the output register 'o1'
Here, the v1/v2/c1/o1 registers should be represented as explicit
registers, and the GPRs should be virtual registers. This would give you
code that looks something like this:
%reg1024 = add v1, c1
%reg1025 = mul v1, c2
%reg1026 = mul %reg1024, %v2
%o1 = mov %reg1026
The 'mov' register-to-register copy instruction will be coallesced and
eliminated by the register allocator. The regalloc will eliminate the
virtual registers, assigning physical GPRs. This is what the 'allocation
order' is to cover.
> I planed to partition the register into different RegisterClass:
> input, output, general purpose, constant, etc.
>
> def GeneralPurposeRC : RegisterClass<packed, 128, [R0, R1]>;
> def InputRC : RegisterClass<packed, 128, [V0, V1]>;
> def ConstantRC : RegisterClass<packed, 128, [C0, C1]>;
The way you want to partition these is based on how the instruction set
works. If there is a single 'add' instruction that can operate on any of
these registers, there should be a single register class. If there are
two adds (as it looks like you have below, judging by the opcode) with
different register constraints, then you should partition the registers so
that each the register classes line up with the instruction operand
requirements.
> def ADDgg : BinaryInst<0x51, (
> ops GeneralPurposeRC :$dest,
> ope GeneralPurposeRC :$src), "add $dest, $src">;
>
> def ADDgi : BinaryInst<0x52, (
> ops GeneralPurposeRC :$dest,
> ope InputRC :$src), "add $dest, $src">;
>
> def ADDgc : BinaryInst<0x52, (
> ops GeneralPurposeRC :$dest,
> ope ConstantRC :$src), "add $dest, $src">;
>
> The problem is: SDOperand alwasy return the 'type' of the value (in
> this case, 'packed', the first argument of RegisterClass<>), but not
> the 'RegisterClass'. With two 'packed' operands, the instruction
> selector doesn't know whether a ADDgg, ADDgi, or an ADDgc should be
> generated (BuildMI() function).
Right. You don't want to do this sort of partitioning. All of the
'computed' values should be virtual registers which will end up being
assigned to GPRs. The register allocator will attempt to coallesce the
GPR into an output or input register if possible. To allow this
coallescing to happen, implement the TargetInstrInfo::isMoveInstr virtual
method for your target.
> The same problem exists when there are two types of costant registers,
> floating point and integer, and each is declared 'packed' ([4xfloat]
> and [4xint]). The instruction selector doesn't know which instruction
> it should produce because the newly defined MVT type 'packed' is
> always used for all operands (registers), even if it's acutally a
> [4xfloat] or [4xint].
It might make sense to add two MVT enums: one for packed integers, and one
for packed floats?
-Chris
> 2005/7/24, Chris Lattner <sabre at nondot.org>:
>> On Sat, 23 Jul 2005, Tzu-Chien Chiu wrote:
>>> 2005/7/23, Chris Lattner <sabre at nondot.org>:
>>>> What does a 'read only' register mean? Is it a constant (e.g. returns
>>>> 1.0)? Otherwise, how can it be a useful value?
>>>
>>> Yes, it's a constant register.
>>>
>>> Because the instruction cannot contain an immediate value, a constant
>>> value may be stored in a constant register, and it's defined _before_
>>> the program starts by API. For example:
>>>
>>> SetConstantValue( 5, Vector4( 1, 2, 3, 4 ); // C5 = <1,2,3,4>
>>> HANDLE handle = LoadCodeFromFile( filename );
>>> SetCode( handle ); // C5 is referenced here
>>> Execute();
>>
>> Ah, ok. In that case, you want to put all of the registers in one register
>> file, and not make the constant register allocatable (e.g. see
>> X86RegisterInfo.td, and note how the register classes include EBP and ESP,
>> but do not register allocate them (through the definition of
>> allocation_order_end()).
>>
>> -Chris
>>
>> --
>> http://nondot.org/sabre/
>> http://llvm.org/
>>
>
>
>
-Chris
--
http://nondot.org/sabre/
http://llvm.org/
More information about the llvm-dev
mailing list