<div dir="ltr">Seems like ARM target use reg_sequnce to form a register tuple and let the store instruction accept that register tuple.<div>Did I understand it correct? What if the address is 64bit while the value is 32bit? Is there any simple way? reg_sequence looks like only accept same type sub-registers.</div><div><br></div><div>But the _real_ difficulty for me is I have already ran-out of lanemask bits.</div><div>I gave a brief introduction of Intel GPU register in the thread:</div><div><a href="http://lists.llvm.org/pipermail/llvm-dev/2016-August/103953.html">http://lists.llvm.org/pipermail/llvm-dev/2016-August/103953.html</a><br></div><div><br></div><div>And in the later trial, I hit the lanemask bits ran-out issue.</div><div><a href="http://lists.llvm.org/pipermail/llvm-dev/2016-August/104017.html">http://lists.llvm.org/pipermail/llvm-dev/2016-August/104017.html</a><br></div><div>Later I choose to define all register tuples using only Rw0~2047, and using subw0~31, I reached RegQ_SIMD8 at most!</div><div>Some piece of RegisterInfo.td are listed:</div><div><div> 11 foreach Index = 0-31 in {</div><div> 12   def subw#Index : SubRegIndex<16, !shl(Index, 4)>;</div><div> 13 }<br></div><div> </div><div>18 class IntelGPUReg<string n, bits<13> regIdx> : Register<n> {</div><div> 20   bits<1> regFile;</div><div> 21</div><div> 22   let Namespace = "IntelGPU";</div><div> 23   let HWEncoding{12-0}  = regIdx;</div><div> 24   let HWEncoding{15}    = regFile;</div><div> 25 }</div><div> 26 foreach Index = 0-2047 in {</div><div> 27   def Rw#Index : IntelGPUReg <"Rw"#Index, !shl(Index, 1)> {</div><div> 28     let regFile = 0;</div><div> 29   }</div><div> 30 }</div><div> 31</div><div> 32 // b-->byte w-->word d-->dword q-->qword</div><div> 33</div><div> 34 def gpr_w : RegisterClass<"IntelGPU", [i16], 16,</div><div> 35                           (sequence "Rw%u", 0, 2047)> {</div><div> 36   let AllocationPriority = 1;</div><div> 37 }</div></div><div><br></div><div><div> 83 def gpr_q_simd8 : RegisterTuples<[subw0, subw1, subw2, subw3, subw4, subw5, subw6, subw7,</div><div> 84                                   subw8, subw9, subw10, subw11, subw12, subw13, subw14, subw15,</div><div> 85                                   subw16, subw17, subw18, subw19, subw20, subw21, subw22, subw23,</div><div> 86                                   subw24, subw25, subw26, subw27, subw28, subw29, subw30, subw31],</div><div> 87                                 [(add (decimate gpr_w, 16)),</div><div> 88                                  (add (decimate (shl gpr_w, 1), 16)),</div><div> 89                                  (add (decimate (shl gpr_w, 2), 16)),</div><div> 90                                  (add (decimate (shl gpr_w, 3), 16)),</div><div> 91                                  (add (decimate (shl gpr_w, 4), 16)),</div><div> 92                                  (add (decimate (shl gpr_w, 5), 16)),</div><div> 93                                  (add (decimate (shl gpr_w, 6), 16)),</div></div><div>....</div><div><div>117                                  (add (decimate (shl gpr_w, 30), 16)),</div><div>118                                  (add (decimate (shl gpr_w, 31), 16))]>;</div></div><div><br></div><div> def RegQ_SIMD8 : RegisterClass<"IntelGPU", [i64, f64], 64, (add gpr_q_simd8)>;<br></div><div><br></div><div>If I introduce larger register tuple, then I need more lanemask bits.</div><div>Maybe I need to find some other way. Or increase lanemask bits greatly.</div><div>But for now it is hard for me as I am not quite familiar with llvm register allocator. Any suggestion?</div><div>If I do not state the problem clearly, please feel free to drop a mail.</div><div><br></div><div>- Ruiling</div><div class="gmail_extra"><br><div class="gmail_quote">2016-09-11 14:50 GMT+08:00 Tim Northover <span dir="ltr"><<a href="mailto:t.p.northover@gmail.com" target="_blank">t.p.northover@gmail.com</a>></span>:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">On 9 September 2016 at 21:19, Quentin Colombet via llvm-dev<br>

<span class="gmail-"><<a href="mailto:llvm-dev@lists.llvm.org">llvm-dev@lists.llvm.org</a>> wrote:<br>

> Make the store instruction takes only one operand, a tuple register.<br>

> You have examples of tuple registers in the ARM backend.<br>

<br>

</span>The difficult bit will be if there are loads with the same property. I<br>

don't think you can easily encode the fact that one half of a register<br>

is read and the other written.<br>

<span class="gmail-HOEnZb"><font color="#888888"><br>

Tim.<br>

</font></span></blockquote></div><br></div></div>