[llvm-dev] How to describe the RegisterInfo?
llvm-dev at lists.llvm.org
Mon Aug 22 09:32:57 PDT 2016
> On Aug 22, 2016, at 6:46 AM, Ruiling Song via llvm-dev <llvm-dev at lists.llvm.org> wrote:
> Hello Everyone,
> I am trying to make a new LLVM backend target for Intel GPU.
> I would start from targeting OpenCL language first.
> But I am not quite familiar with LLVM backend infrastructure.
> I have some problem on describing the RegisterInfo.
> Intel GPU launches lots of hardware threads to do GPGPU workload.
> Each hardware thread has 128 registers(r0-r127), with each one of size 32 byte.
> Each hardware thread may run in SIMD 8/16/32 way, which maps to
> 8/16/32 OpenCL working items. And the SIMD width is chosen at
> compile time (normally chosen according to register pressure, bigger simd width means bigger register pressure).
> Note each instruction has each own exec-width, which may not be equal to program SIMD width.
> Normally we would allocate contiguous registers for divergent value.
> For example, we have a program compiled as SIMD 8, we need to allocate 4 byte*8=32 byte
> value for a divergent float/i32 value. But if there is a 'short type' value,
> it only needs 2 byte*8=16 byte, that is half of a 32-byte-register.
> we may also allocate for 'uniform' value, a uniform value only needs type-sized register,
> without multiply 'simd-width'. A uniform float/i32 value only needs 4 byte physical register.
> Thus a 32-byte-register can hold up to 8 different uniform float/i32 values.
As a GPU backend maintainer, I strongly discourage trying to model the total register bank of the GPU in LLVM. Just model one thread. This will make things much, much easier.
> Some time we also need to access register in stride way. Like a bitcast from i64 to v2i32,
> we need to access the i64 register with horizontal stride of 2.
> Look below example, the i64 value is hold in r10 and r11. L/H stands for the low 32bit/high 32bit.
> And the simd width of the program is SIMD 8, so we have 8 pairs of L/H.
> r10: L H L H L H L H
> r11: L H L H L H L H
> below two instructions will extract the low 32bit and high 32bit part.
> mov(8 | M0) r12.0<1>, r10.0<8,4,2>:D
> mov(8 | M0) r13.0<1>, r10.1<8,4,2>:D
> (The format of a register region is RegNum.regSubNum<vertStride, width, horzStride>:type)
> (Note the regSubNum is measured in units of the register type here.)
> then r12/r13 contains the result vector components.
> You can refer below link for more details on Intel GPU assembly and register usage:
> https://software.intel.com/en-us/articles/introduction-to-gen-assembly <https://software.intel.com/en-us/articles/introduction-to-gen-assembly>
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the llvm-dev