[llvm-dev] how to allocate consecutive register?
Ruiling Song via llvm-dev
llvm-dev at lists.llvm.org
Mon Sep 12 05:48:25 PDT 2016
Seems like ARM target use reg_sequnce to form a register tuple and let the
store instruction accept that register tuple.
Did I understand it correct? What if the address is 64bit while the value
is 32bit? Is there any simple way? reg_sequence looks like only accept same
type sub-registers.
But the _real_ difficulty for me is I have already ran-out of lanemask bits.
I gave a brief introduction of Intel GPU register in the thread:
http://lists.llvm.org/pipermail/llvm-dev/2016-August/103953.html
And in the later trial, I hit the lanemask bits ran-out issue.
http://lists.llvm.org/pipermail/llvm-dev/2016-August/104017.html
Later I choose to define all register tuples using only Rw0~2047, and using
subw0~31, I reached RegQ_SIMD8 at most!
Some piece of RegisterInfo.td are listed:
11 foreach Index = 0-31 in {
12 def subw#Index : SubRegIndex<16, !shl(Index, 4)>;
13 }
18 class IntelGPUReg<string n, bits<13> regIdx> : Register<n> {
20 bits<1> regFile;
21
22 let Namespace = "IntelGPU";
23 let HWEncoding{12-0} = regIdx;
24 let HWEncoding{15} = regFile;
25 }
26 foreach Index = 0-2047 in {
27 def Rw#Index : IntelGPUReg <"Rw"#Index, !shl(Index, 1)> {
28 let regFile = 0;
29 }
30 }
31
32 // b-->byte w-->word d-->dword q-->qword
33
34 def gpr_w : RegisterClass<"IntelGPU", [i16], 16,
35 (sequence "Rw%u", 0, 2047)> {
36 let AllocationPriority = 1;
37 }
83 def gpr_q_simd8 : RegisterTuples<[subw0, subw1, subw2, subw3, subw4,
subw5, subw6, subw7,
84 subw8, subw9, subw10, subw11, subw12,
subw13, subw14, subw15,
85 subw16, subw17, subw18, subw19,
subw20, subw21, subw22, subw23,
86 subw24, subw25, subw26, subw27,
subw28, subw29, subw30, subw31],
87 [(add (decimate gpr_w, 16)),
88 (add (decimate (shl gpr_w, 1), 16)),
89 (add (decimate (shl gpr_w, 2), 16)),
90 (add (decimate (shl gpr_w, 3), 16)),
91 (add (decimate (shl gpr_w, 4), 16)),
92 (add (decimate (shl gpr_w, 5), 16)),
93 (add (decimate (shl gpr_w, 6), 16)),
....
117 (add (decimate (shl gpr_w, 30), 16)),
118 (add (decimate (shl gpr_w, 31), 16))]>;
def RegQ_SIMD8 : RegisterClass<"IntelGPU", [i64, f64], 64, (add
gpr_q_simd8)>;
If I introduce larger register tuple, then I need more lanemask bits.
Maybe I need to find some other way. Or increase lanemask bits greatly.
But for now it is hard for me as I am not quite familiar with llvm register
allocator. Any suggestion?
If I do not state the problem clearly, please feel free to drop a mail.
- Ruiling
2016-09-11 14:50 GMT+08:00 Tim Northover <t.p.northover at gmail.com>:
> On 9 September 2016 at 21:19, Quentin Colombet via llvm-dev
> <llvm-dev at lists.llvm.org> wrote:
> > Make the store instruction takes only one operand, a tuple register.
> > You have examples of tuple registers in the ARM backend.
>
> The difficult bit will be if there are loads with the same property. I
> don't think you can easily encode the fact that one half of a register
> is read and the other written.
>
> Tim.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160912/c3de7700/attachment.html>
More information about the llvm-dev
mailing list