[llvm-dev] how to allocate consecutive register?

Mon Sep 12 05:48:25 PDT 2016

Seems like ARM target use reg_sequnce to form a register tuple and let the
store instruction accept that register tuple.
Did I understand it correct? What if the address is 64bit while the value
is 32bit? Is there any simple way? reg_sequence looks like only accept same
type sub-registers.

But the _real_ difficulty for me is I have already ran-out of lanemask bits.
I gave a brief introduction of Intel GPU register in the thread:
http://lists.llvm.org/pipermail/llvm-dev/2016-August/103953.html

And in the later trial, I hit the lanemask bits ran-out issue.
http://lists.llvm.org/pipermail/llvm-dev/2016-August/104017.html
Later I choose to define all register tuples using only Rw0~2047, and using
subw0~31, I reached RegQ_SIMD8 at most!
Some piece of RegisterInfo.td are listed:
 11 foreach Index = 0-31 in {
 12   def subw#Index : SubRegIndex<16, !shl(Index, 4)>;
 13 }

18 class IntelGPUReg<string n, bits<13> regIdx> : Register<n> {
 20   bits<1> regFile;
 21
 22   let Namespace = "IntelGPU";
 23   let HWEncoding{12-0}  = regIdx;
 24   let HWEncoding{15}    = regFile;
 25 }
 26 foreach Index = 0-2047 in {
 27   def Rw#Index : IntelGPUReg <"Rw"#Index, !shl(Index, 1)> {
 28     let regFile = 0;
 29   }
 30 }
 31
 32 // b-->byte w-->word d-->dword q-->qword
 33
 34 def gpr_w : RegisterClass<"IntelGPU", [i16], 16,
 35                           (sequence "Rw%u", 0, 2047)> {
 36   let AllocationPriority = 1;
 37 }

 83 def gpr_q_simd8 : RegisterTuples<[subw0, subw1, subw2, subw3, subw4,
subw5, subw6, subw7,
 84                                   subw8, subw9, subw10, subw11, subw12,
subw13, subw14, subw15,
 85                                   subw16, subw17, subw18, subw19,
subw20, subw21, subw22, subw23,
 86                                   subw24, subw25, subw26, subw27,
subw28, subw29, subw30, subw31],
 87                                 [(add (decimate gpr_w, 16)),
 88                                  (add (decimate (shl gpr_w, 1), 16)),
 89                                  (add (decimate (shl gpr_w, 2), 16)),
 90                                  (add (decimate (shl gpr_w, 3), 16)),
 91                                  (add (decimate (shl gpr_w, 4), 16)),
 92                                  (add (decimate (shl gpr_w, 5), 16)),
 93                                  (add (decimate (shl gpr_w, 6), 16)),
....
117                                  (add (decimate (shl gpr_w, 30), 16)),
118                                  (add (decimate (shl gpr_w, 31), 16))]>;

 def RegQ_SIMD8 : RegisterClass<"IntelGPU", [i64, f64], 64, (add
gpr_q_simd8)>;

If I introduce larger register tuple, then I need more lanemask bits.
Maybe I need to find some other way. Or increase lanemask bits greatly.
But for now it is hard for me as I am not quite familiar with llvm register
allocator. Any suggestion?
If I do not state the problem clearly, please feel free to drop a mail.

- Ruiling

2016-09-11 14:50 GMT+08:00 Tim Northover <t.p.northover at gmail.com>:

> On 9 September 2016 at 21:19, Quentin Colombet via llvm-dev
> <llvm-dev at lists.llvm.org> wrote:
> > Make the store instruction takes only one operand, a tuple register.
> > You have examples of tuple registers in the ARM backend.
>
> The difficult bit will be if there are loads with the same property. I
> don't think you can easily encode the fact that one half of a register
> is read and the other written.
>
> Tim.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160912/c3de7700/attachment.html>