[llvm-dev] How to describe the RegisterInfo?

Tue Aug 23 11:34:13 PDT 2016

> On Aug 23, 2016, at 12:08 AM, Ruiling Song via llvm-dev <llvm-dev at lists.llvm.org> wrote:
> 
> Yes, the arch is just as you said, something like AMD GPU, but Intel GPU don't have separate register file for 'scalar/vector'.
> In fact my idea of defining the register tuples was borrowed from SIRegisterInfo.td in AMD GPU.
> But seems that AMD GPU mainly support i32/i64 register type, while Intel GPU also support byte/short register type.
> So I have to start defining the registers from 'byte' type, and then build up other type registers through RegisterTuples.
> I thought RegisterTuple is kind of expressing register alias in RegisterInfo.td file. I am not sure whether I understand it correctly. My first trial was like below(to make things simple, I remove some WORD/QWORD register class):
> let Namespace = "IntelGPU" in {
> 
> foreach Index = 0-15 in {
>   def sub#Index : SubRegIndex<32, !shl(Index, 5)>;
> }
> }
> 
> class IntelGPUReg<string n, bits<13> regIdx> : Register<n> {
>   bits<2> HStride;
>   bits<1> regFile;
> 
>   let Namespace = "IntelGPU";
>   let HWEncoding{12-0}  = regIdx;
>   let HWEncoding{15}    = regFile;
> }
> // here I define the whole 4096 byte registers
> foreach Index = 0-4095 in {
>   def Rb#Index : IntelGPUReg <"Rb"#Index, Index> {
>     let regFile = 0;
>   }
> }
> 
> // b-->byte w-->word d-->dword q-->qword
> // the set of uniform byte register
> def gpr_b : RegisterClass<"IntelGPU", [i8], 8,
>                           (sequence "Rb%u", 0, 4095)> {
>   let AllocationPriority = 1;
> }
> 
> def gpr_d : RegisterTuples<[sub0, sub1, sub2, sub3],
>                               [(add (decimate gpr_b, 4)),
>                                (add (decimate (shl gpr_b, 1), 4)),
>                                (add (decimate (shl gpr_b, 2), 4)),
>                                (add (decimate (shl gpr_b, 3), 4))]>;
> 
> // simd byte use stride 2 register as stride 1 does not support useful ALU instruction
> def gpr_b_simd8 : RegisterTuples<[sub0, sub1, sub2, sub3, sub4, sub5, sub6, sub7],
>                                  [(add (decimate gpr_b, 16)),
>                                   (add (decimate (shl gpr_b, 2), 16)),
>                                   (add (decimate (shl gpr_b, 4), 16)),
>                                   (add (decimate (shl gpr_b, 6), 16)),
>                                   (add (decimate (shl gpr_b, 8), 16)),
>                                   (add (decimate (shl gpr_b, 10), 16)),
>                                   (add (decimate (shl gpr_b, 12), 16)),
>                                   (add (decimate (shl gpr_b, 14), 16))]>;
> 								  
> def gpr_d_simd8 : RegisterTuples<[sub0, sub1, sub2, sub3, sub4, sub5, sub6, sub7],
>                                 [(add (decimate gpr_d, 8)),
>                                  (add (decimate (shl gpr_d, 1), 8)),
>                                  (add (decimate (shl gpr_d, 2), 8)),
>                                  (add (decimate (shl gpr_d, 3), 8)),
>                                  (add (decimate (shl gpr_d, 4), 8)),
>                                  (add (decimate (shl gpr_d, 5), 8)),
>                                  (add (decimate (shl gpr_d, 6), 8)),
>                                  (add (decimate (shl gpr_d, 7), 8))]>;
> def RegD_Uniform : RegisterClass<"IntelGPU", [i32, f32], 32, (add gpr_d)>;
> def RegD_SIMD8 : RegisterClass<"IntelGPU", [i32, f32], 32, (add gpr_d_simd8)> {
> }
> This is easy for me to define the register alias information. But it won't works!
> the tablegen exit and tells me: "error:Ran out of lanemask bits to represent subregister sub1_then_sub1"
> Anybody know what's wrong here?

lanemasks are used at several places in the compiler to describe live/dead subregisters parts. That is if you take your largest register (may be a tuple) how many different subregisters you can reach by that. I would expect that in your example you can from a gpr_d_simd8 you can reach 8 gpr_d registers through sub0-sub7 and from each gpr_d you can reach 4 gpr_b registers through sub0-sub3. This should be fine with 32 bites/lanes. I am not sure if that is the problem here but I think you should use different subregisters indixes for the byte access (bsub0-bsub3) than you used for the higher level tuples.

You could also experiment with increasing the limit in Tablegen and changing the LaneBitmask typedef, however this has possible implications on memory use and performance of the register allocator so it would be good to find a way to avoid that.

- Matthias