<div dir="ltr"><span style="font-size:14px">Hi </span><span class="" style="font-size:14px;background-color:rgb(255,255,255)">Escha</span><span style="font-size:14px">,</span><div style="font-size:14px"><br></div><div style="font-size:14px">Great to have your comment! Do you have any specific reason for not doing like this?</div><div style="font-size:14px">I am not sure whether I understand your point correctly. For "just model one thread",</div><div style="font-size:14px">do you mean "only considering ONE of the 8/16 working lanes that running in lock-step way"??</div><div style="font-size:14px"><br></div><div style="font-size:14px">For my case, may be something like I only need to define r0~r127 as register for i32 register (each r# is just enough for simd8 i32).</div><div style="font-size:14px">Then the register allocator never need to go to allocate the sub-registers, just operate them as a whole. right?</div><div style="font-size:14px"><br></div><div style="font-size:14px">Yes, it looks really easy for divergent registers. But I think then I would lose the ability</div><div style="font-size:14px">to allocate uniform register. Am I right? Is there any way to allocate uniform register</div><div style="font-size:14px">as well as allocate divergent register?<div class="gmail_extra"><br></div><div class="gmail_extra">Thanks!</div><div class="gmail_extra">Ruiling</div></div><div class="gmail_extra"><br><div class="gmail_quote">2016-08-23 0:32 GMT+08:00  <span dir="ltr"><<a href="mailto:escha@apple.com" target="_blank">escha@apple.com</a>></span>:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div style="word-wrap:break-word"><br><div><span class=""><blockquote type="cite"><div>On Aug 22, 2016, at 6:46 AM, Ruiling Song via llvm-dev <<a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a>> wrote:</div><br><div><div dir="ltr"><div>Hello Everyone,</div><div><br></div><div>I am trying to make a new LLVM backend target for Intel GPU.<br>I would start from targeting OpenCL language first.</div><div>But I am not quite familiar with LLVM backend infrastructure.</div><div>I have some problem on describing the RegisterInfo.<br><br></div><div>Intel GPU launches lots of hardware threads to do GPGPU workload.</div><div>Each hardware thread has 128 registers(r0-r127), with each one of size 32 byte.<br></div><div>Each hardware thread may run in SIMD 8/16/32 way, which maps to</div><div>8/16/32 OpenCL working items. And the SIMD width is chosen at</div><div>compile time (normally chosen according to register pressure, bigger simd width means bigger register pressure).</div><div>Note each instruction has each own exec-width, which may not be equal to program SIMD width.</div><div>Normally we would allocate contiguous registers for divergent value.</div><div>For example, we have a program compiled as SIMD 8, we need to allocate 4 byte*8=32 byte</div><div>value for a divergent float/i32 value. But if there is a 'short type' value,</div><div>it only needs 2 byte*8=16 byte, that is half of a 32-byte-register.<br></div><div>we may also allocate for 'uniform' value, a uniform value only needs type-sized register,</div><div>without multiply 'simd-width'. A uniform float/i32 value only needs 4 byte physical register.</div><div>Thus a 32-byte-register can hold up to 8 different uniform float/i32 values.<br></div></div></div></blockquote><div><br></div></span><div>As a GPU backend maintainer, I strongly discourage trying to model the total register bank of the GPU in LLVM. Just model one thread. This will make things much, much easier.</div><span class=""><blockquote type="cite"><div dir="ltr"><div></div></div></blockquote><br></span></div><div>—escha</div></div></blockquote></div><br></div></div>