[llvm-dev] How to describe the RegisterInfo?

Mon Aug 22 06:46:10 PDT 2016

Hello Everyone,

I am trying to make a new LLVM backend target for Intel GPU.
I would start from targeting OpenCL language first.
But I am not quite familiar with LLVM backend infrastructure.
I have some problem on describing the RegisterInfo.

Intel GPU launches lots of hardware threads to do GPGPU workload.
Each hardware thread has 128 registers(r0-r127), with each one of size 32
byte.
Each hardware thread may run in SIMD 8/16/32 way, which maps to
8/16/32 OpenCL working items. And the SIMD width is chosen at
compile time (normally chosen according to register pressure, bigger simd
width means bigger register pressure).
Note each instruction has each own exec-width, which may not be equal to
program SIMD width.
Normally we would allocate contiguous registers for divergent value.
For example, we have a program compiled as SIMD 8, we need to allocate 4
byte*8=32 byte
value for a divergent float/i32 value. But if there is a 'short type' value,
it only needs 2 byte*8=16 byte, that is half of a 32-byte-register.
we may also allocate for 'uniform' value, a uniform value only needs
type-sized register,
without multiply 'simd-width'. A uniform float/i32 value only needs 4 byte
physical register.
Thus a 32-byte-register can hold up to 8 different uniform float/i32 values.

Some time we also need to access register in stride way. Like a bitcast
from i64 to v2i32,
we need to access the i64 register with horizontal stride of 2.
Look below example, the i64 value is hold in r10 and r11. L/H stands for
the low 32bit/high 32bit.
And the simd width of the program is SIMD 8, so we have 8 pairs of L/H.
r10: L H L H L H L H
r11: L H L H L H L H
below two instructions will extract the low 32bit and high 32bit part.
mov(8 | M0) r12.0<1>, r10.0<8,4,2>:D
mov(8 | M0) r13.0<1>, r10.1<8,4,2>:D
(The format of a register region is RegNum.regSubNum<vertStride, width,
horzStride>:type)
(Note the regSubNum is measured in units of the register type here.)
then r12/r13 contains the result vector components.
You can refer below link for more details on Intel GPU assembly and
register usage:
https://software.intel.com/en-us/articles/introduction-to-gen-assembly

I notice the hardware encoding of a register is 16 bit. that is not enough
to encode all the
register region parameters(regNum, type, hstride, vstride, width,...) in
RegisterInfo.td. And I am not sure
which is the reasonable place to hold this stride/type/width information
for a physical register.
Maybe some other .cpp file is more suitable than RegisterInfo.td file?
Because I need to change the register
region parameters in the bitcast instruction( from qword with hstride 1 to
dword with hstride 2)
At which stage is suitable to do such bitcast instruction logic? after
reg-alloc?

The detailed hardware spec is located at:
https://01.org/sites/default/files/documentation/intel-gfx-
prm-osrc-bdw-vol07-3d_media_gpgpu_3.pdf
at page 921, it describe the detailed instruction encode format.
It needs (regFile, regNum, subRegNum, width, type, addrMode, hStride,
vStride) to describe a register.

I have attached my first version RegisterInfo.td.
And I also have a question about the attached RegisterInfo.td file. Do I
have to define different SubRegIndex
like below to make TableGen works correctly?

foreach Index = 0-15 in {
 def subd#Index :SubRegIndex<32, !shl(Index, 5)>; //used as SubRegIndex
when declaring gpr_d_simd8
 def subw#Index: SubRegIndex<16, !shl(Index, 4)>; //used as SubRegIndex
when declaring gpr_w_simd8
 ...
}

If anything I am not saying clear, just reply the mail. Thanks for any help!

Thanks!
Ruiling
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160822/7e9761ee/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: IntelGPURegisterInfo.td
Type: application/octet-stream
Size: 5907 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160822/7e9761ee/attachment.obj>