[llvm-dev] How to describe the RegisterInfo?

Ruiling Song via llvm-dev llvm-dev at lists.llvm.org
Mon Aug 22 20:07:57 PDT 2016


Hi Escha,

Great to have your comment! Do you have any specific reason for not doing
like this?
I am not sure whether I understand your point correctly. For "just model
one thread",
do you mean "only considering ONE of the 8/16 working lanes that running in
lock-step way"??

For my case, may be something like I only need to define r0~r127 as
register for i32 register (each r# is just enough for simd8 i32).
Then the register allocator never need to go to allocate the sub-registers,
just operate them as a whole. right?

Yes, it looks really easy for divergent registers. But I think then I would
lose the ability
to allocate uniform register. Am I right? Is there any way to allocate
uniform register
as well as allocate divergent register?

Thanks!
Ruiling

2016-08-23 0:32 GMT+08:00 <escha at apple.com>:

>
> On Aug 22, 2016, at 6:46 AM, Ruiling Song via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
> Hello Everyone,
>
> I am trying to make a new LLVM backend target for Intel GPU.
> I would start from targeting OpenCL language first.
> But I am not quite familiar with LLVM backend infrastructure.
> I have some problem on describing the RegisterInfo.
>
> Intel GPU launches lots of hardware threads to do GPGPU workload.
> Each hardware thread has 128 registers(r0-r127), with each one of size 32
> byte.
> Each hardware thread may run in SIMD 8/16/32 way, which maps to
> 8/16/32 OpenCL working items. And the SIMD width is chosen at
> compile time (normally chosen according to register pressure, bigger simd
> width means bigger register pressure).
> Note each instruction has each own exec-width, which may not be equal to
> program SIMD width.
> Normally we would allocate contiguous registers for divergent value.
> For example, we have a program compiled as SIMD 8, we need to allocate 4
> byte*8=32 byte
> value for a divergent float/i32 value. But if there is a 'short type'
> value,
> it only needs 2 byte*8=16 byte, that is half of a 32-byte-register.
> we may also allocate for 'uniform' value, a uniform value only needs
> type-sized register,
> without multiply 'simd-width'. A uniform float/i32 value only needs 4 byte
> physical register.
> Thus a 32-byte-register can hold up to 8 different uniform float/i32
> values.
>
>
> As a GPU backend maintainer, I strongly discourage trying to model the
> total register bank of the GPU in LLVM. Just model one thread. This will
> make things much, much easier.
>
>
> —escha
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160823/8f63bb41/attachment.html>


More information about the llvm-dev mailing list