<div dir="ltr"><div>Hello Everyone,</div><div><br></div><div>I am trying to make a new LLVM backend target for Intel GPU.<br>I would start from targeting OpenCL language first.</div><div>But I am not quite familiar with LLVM backend infrastructure.</div><div>I have some problem on describing the RegisterInfo.<br><br></div><div>Intel GPU launches lots of hardware threads to do GPGPU workload.</div><div>Each hardware thread has 128 registers(r0-r127), with each one of size 32 byte.<br></div><div>Each hardware thread may run in SIMD 8/16/32 way, which maps to</div><div>8/16/32 OpenCL working items. And the SIMD width is chosen at</div><div>compile time (normally chosen according to register pressure, bigger simd width means bigger register pressure).</div><div>Note each instruction has each own exec-width, which may not be equal to program SIMD width.</div><div>Normally we would allocate contiguous registers for divergent value.</div><div>For example, we have a program compiled as SIMD 8, we need to allocate 4 byte*8=32 byte</div><div>value for a divergent float/i32 value. But if there is a 'short type' value,</div><div>it only needs 2 byte*8=16 byte, that is half of a 32-byte-register.<br></div><div>we may also allocate for 'uniform' value, a uniform value only needs type-sized register,</div><div>without multiply 'simd-width'. A uniform float/i32 value only needs 4 byte physical register.</div><div>Thus a 32-byte-register can hold up to 8 different uniform float/i32 values.<br></div><div><br>Some time we also need to access register in stride way. Like a bitcast from i64 to v2i32,<br>we need to access the i64 register with horizontal stride of 2.</div><div>Look below example, the i64 value is hold in r10 and r11. L/H stands for the low 32bit/high 32bit.</div><div>And the simd width of the program is SIMD 8, so we have 8 pairs of L/H.<br>r10: L H L H L H L H<br>r11: L H L H L H L H</div><div>below two instructions will extract the low 32bit and high 32bit part.<br>mov(8 | M0) r12.0<1>, r10.0<8,4,2>:D</div><div>mov(8 | M0) r13.0<1>, r10.1<8,4,2>:D</div><div>(The format of a register region is RegNum.regSubNum<vertStride, width, horzStride>:type)</div><div>(Note the regSubNum is measured in units of the register type here.)</div><div>then r12/r13 contains the result vector components.</div><div>You can refer below link for more details on Intel GPU assembly and register usage:</div><div><div><a href="https://software.intel.com/en-us/articles/introduction-to-gen-assembly" target="_blank">https://software.intel.com/en-<wbr>us/articles/introduction-to-<wbr>gen-assembly</a><br></div></div><div><br></div><div>I notice the hardware encoding of a register is 16 bit. that is not enough to encode all the</div><div>register region parameters(regNum, type, hstride, vstride, width,...) in RegisterInfo.td. And I am not sure</div><div>which is the reasonable place to hold this stride/type/width information for a physical register.</div><div>Maybe some other .cpp file is more suitable than RegisterInfo.td file? Because I need to change the register</div><div>region parameters in the bitcast instruction( from qword with hstride 1 to dword with hstride 2)</div><div>At which stage is suitable to do such bitcast instruction logic? after reg-alloc?<br><br>The detailed hardware spec is located at:<br><a href="https://01.org/sites/default/files/documentation/intel-gfx-prm-osrc-bdw-vol07-3d_media_gpgpu_3.pdf" target="_blank">https://01.org/sites/default/<wbr>files/documentation/intel-gfx-<wbr>prm-osrc-bdw-vol07-3d_media_<wbr>gpgpu_3.pdf</a><br>at page 921, it describe the detailed instruction encode format.</div><div>It needs (regFile, regNum, subRegNum, width, type, addrMode, hStride, vStride) to describe a register.<br><br></div><div>I have attached my first version RegisterInfo.td.</div><div>And I also have a question about the attached RegisterInfo.td file. Do I have to define different SubRegIndex</div><div>like below to make TableGen works correctly?</div><div><br></div><div>foreach Index = 0-15 in {</div><div> def subd#Index :SubRegIndex<32, !shl(Index, 5)>; //used as SubRegIndex when declaring gpr_d_simd8</div><div> def subw#Index: SubRegIndex<16, !shl(Index, 4)>; //used as SubRegIndex when declaring gpr_w_simd8</div><div> ...</div><div>}</div><div><br></div><div>If anything I am not saying clear, just reply the mail. Thanks for any help!</div><div><br></div><div>Thanks!</div><div>Ruiling</div></div>