<div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote">2016-08-24 1:32 GMT+08:00 Tom Stellard <span dir="ltr"><<a href="mailto:tom@stellard.net" target="_blank">tom@stellard.net</a>></span>:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="HOEnZb"><div class="h5">On Mon, Aug 22, 2016 at 09:46:10PM +0800, Ruiling Song via llvm-dev wrote:<br>

> Hello Everyone,<br>

><br>

> I am trying to make a new LLVM backend target for Intel GPU.<br>

> I would start from targeting OpenCL language first.<br>

> But I am not quite familiar with LLVM backend infrastructure.<br>

> I have some problem on describing the RegisterInfo.<br>

><br>

> Intel GPU launches lots of hardware threads to do GPGPU workload.<br>

> Each hardware thread has 128 registers(r0-r127), with each one of size 32<br>

> byte.<br>

> Each hardware thread may run in SIMD 8/16/32 way, which maps to<br>

> 8/16/32 OpenCL working items. And the SIMD width is chosen at<br>

> compile time (normally chosen according to register pressure, bigger simd<br>

> width means bigger register pressure).<br>

> Note each instruction has each own exec-width, which may not be equal to<br>

> program SIMD width.<br>

> Normally we would allocate contiguous registers for divergent value.<br>

> For example, we have a program compiled as SIMD 8, we need to allocate 4<br>

> byte*8=32 byte<br>

> value for a divergent float/i32 value. But if there is a 'short type' value,<br>

> it only needs 2 byte*8=16 byte, that is half of a 32-byte-register.<br>

> we may also allocate for 'uniform' value, a uniform value only needs<br>

> type-sized register,<br>

> without multiply 'simd-width'. A uniform float/i32 value only needs 4 byte<br>

> physical register.<br>

> Thus a 32-byte-register can hold up to 8 different uniform float/i32 values.<br>

><br>

> Some time we also need to access register in stride way. Like a bitcast<br>

> from i64 to v2i32,<br>

> we need to access the i64 register with horizontal stride of 2.<br>

> Look below example, the i64 value is hold in r10 and r11. L/H stands for<br>

> the low 32bit/high 32bit.<br>

> And the simd width of the program is SIMD 8, so we have 8 pairs of L/H.<br>

> r10: L H L H L H L H<br>

> r11: L H L H L H L H<br>

> below two instructions will extract the low 32bit and high 32bit part.<br>

> mov(8 | M0) r12.0<1>, r10.0<8,4,2>:D<br>

> mov(8 | M0) r13.0<1>, r10.1<8,4,2>:D<br>

> (The format of a register region is RegNum.regSubNum<vertStride, width,<br>

> horzStride>:type)<br>

> (Note the regSubNum is measured in units of the register type here.)<br>

> then r12/r13 contains the result vector components.<br>

> You can refer below link for more details on Intel GPU assembly and<br>

> register usage:<br>

> <a href="https://software.intel.com/en-us/articles/introduction-to-gen-assembly" rel="noreferrer" target="_blank">https://software.intel.com/en-<wbr>us/articles/introduction-to-<wbr>gen-assembly</a><br>

><br>

> I notice the hardware encoding of a register is 16 bit. that is not enough<br>

> to encode all the<br>

> register region parameters(regNum, type, hstride, vstride, width,...) in<br>

> RegisterInfo.td. And I am not sure<br>

> which is the reasonable place to hold this stride/type/width information<br>

> for a physical register.<br>

> Maybe some other .cpp file is more suitable than RegisterInfo.td file?<br>

> Because I need to change the register<br>

> region parameters in the bitcast instruction( from qword with hstride 1 to<br>

> dword with hstride 2)<br>

> At which stage is suitable to do such bitcast instruction logic? after<br>

> reg-alloc?<br>

><br>

<br>

</div></div>Hi,<br>

<br>

I would recommend encoding some of the register region parameters as part<br>

of the instruction rather than using the register encoding, because<br>

something like 'width' seems  more like a property of the instruction<br>

than of the register to me.<br>

<br>

-Tom<br>

<span class=""></span><br></blockquote><div>Hi Tom,<br><br></div><div>Thanks for your suggestion. I agree that some region parameters need to be part<br></div><div>of the instruction descriptor. But it is a little hard for me to point out which parameters<br></div><div>should go to instruction descriptor, which should be declared in RegisterInfo.td. My<br></div><div>current idea was to describe uniform/non-uniform register in RegisterInfo.td. while other<br></div><div>register region paramters (like stride etc.) are left to instruction descriptor. The simd-width of the <br>compiled program is used to determine the width of the non-uniform register (normally 8 lanes or 16 lanes),<br>So I think this should be included in RegisterInfo.td. So if it is non-uniform value, I would assgin non-uniform<br>registerClass to it. I am not sure whether this can be easily done in LLVM.<br>I don't know if there are any other possible way to do it instead of declaring<br>uniform/non-uniform register in RegisterInfo.td file. Please share with me<br>if you have idea on how to allocate non-uniform registers if it is not handled in RegisterInfo.td.<br></div><div><br></div><div>- Ruiling<br></div></div></div></div>