[llvm-dev] How to describe the RegisterInfo?
Ruiling Song via llvm-dev
llvm-dev at lists.llvm.org
Wed Aug 24 01:47:58 PDT 2016
2016-08-24 1:32 GMT+08:00 Tom Stellard <tom at stellard.net>:
> On Mon, Aug 22, 2016 at 09:46:10PM +0800, Ruiling Song via llvm-dev wrote:
> > Hello Everyone,
> > I am trying to make a new LLVM backend target for Intel GPU.
> > I would start from targeting OpenCL language first.
> > But I am not quite familiar with LLVM backend infrastructure.
> > I have some problem on describing the RegisterInfo.
> > Intel GPU launches lots of hardware threads to do GPGPU workload.
> > Each hardware thread has 128 registers(r0-r127), with each one of size 32
> > byte.
> > Each hardware thread may run in SIMD 8/16/32 way, which maps to
> > 8/16/32 OpenCL working items. And the SIMD width is chosen at
> > compile time (normally chosen according to register pressure, bigger simd
> > width means bigger register pressure).
> > Note each instruction has each own exec-width, which may not be equal to
> > program SIMD width.
> > Normally we would allocate contiguous registers for divergent value.
> > For example, we have a program compiled as SIMD 8, we need to allocate 4
> > byte*8=32 byte
> > value for a divergent float/i32 value. But if there is a 'short type'
> > it only needs 2 byte*8=16 byte, that is half of a 32-byte-register.
> > we may also allocate for 'uniform' value, a uniform value only needs
> > type-sized register,
> > without multiply 'simd-width'. A uniform float/i32 value only needs 4
> > physical register.
> > Thus a 32-byte-register can hold up to 8 different uniform float/i32
> > Some time we also need to access register in stride way. Like a bitcast
> > from i64 to v2i32,
> > we need to access the i64 register with horizontal stride of 2.
> > Look below example, the i64 value is hold in r10 and r11. L/H stands for
> > the low 32bit/high 32bit.
> > And the simd width of the program is SIMD 8, so we have 8 pairs of L/H.
> > r10: L H L H L H L H
> > r11: L H L H L H L H
> > below two instructions will extract the low 32bit and high 32bit part.
> > mov(8 | M0) r12.0<1>, r10.0<8,4,2>:D
> > mov(8 | M0) r13.0<1>, r10.1<8,4,2>:D
> > (The format of a register region is RegNum.regSubNum<vertStride, width,
> > horzStride>:type)
> > (Note the regSubNum is measured in units of the register type here.)
> > then r12/r13 contains the result vector components.
> > You can refer below link for more details on Intel GPU assembly and
> > register usage:
> > https://software.intel.com/en-us/articles/introduction-to-gen-assembly
> > I notice the hardware encoding of a register is 16 bit. that is not
> > to encode all the
> > register region parameters(regNum, type, hstride, vstride, width,...) in
> > RegisterInfo.td. And I am not sure
> > which is the reasonable place to hold this stride/type/width information
> > for a physical register.
> > Maybe some other .cpp file is more suitable than RegisterInfo.td file?
> > Because I need to change the register
> > region parameters in the bitcast instruction( from qword with hstride 1
> > dword with hstride 2)
> > At which stage is suitable to do such bitcast instruction logic? after
> > reg-alloc?
> I would recommend encoding some of the register region parameters as part
> of the instruction rather than using the register encoding, because
> something like 'width' seems more like a property of the instruction
> than of the register to me.
> Hi Tom,
Thanks for your suggestion. I agree that some region parameters need to be
of the instruction descriptor. But it is a little hard for me to point out
should go to instruction descriptor, which should be declared in
current idea was to describe uniform/non-uniform register in
RegisterInfo.td. while other
register region paramters (like stride etc.) are left to instruction
descriptor. The simd-width of the
compiled program is used to determine the width of the non-uniform register
(normally 8 lanes or 16 lanes),
So I think this should be included in RegisterInfo.td. So if it is
non-uniform value, I would assgin non-uniform
registerClass to it. I am not sure whether this can be easily done in LLVM.
I don't know if there are any other possible way to do it instead of
uniform/non-uniform register in RegisterInfo.td file. Please share with me
if you have idea on how to allocate non-uniform registers if it is not
handled in RegisterInfo.td.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the llvm-dev