<div dir="ltr">Actually my hardware is designed such that there are 32 lanes. each has 8 registers. the assembly code should be emitted keeping this fact.<div>I defined the registers as follows in .td in the following order;</div><div><span style="color:rgb(80,0,80);font-size:12.8px">L_0_R_0, </span></div><div><span style="color:rgb(80,0,80);font-size:12.8px">L_0_</span><span style="color:rgb(80,0,80);font-size:12.8px">R_1, </span></div><div><span style="color:rgb(80,0,80);font-size:12.8px">L_0_</span><span style="color:rgb(80,0,80);font-size:12.8px">R_2, </span></div><div><span style="color:rgb(80,0,80);font-size:12.8px">L_0_</span><span style="color:rgb(80,0,80);font-size:12.8px">R_3, </span></div><div><span style="color:rgb(80,0,80);font-size:12.8px">L_0_</span><span style="color:rgb(80,0,80);font-size:12.8px">R_4, </span></div><div><span style="color:rgb(80,0,80);font-size:12.8px">L_0_</span><span style="color:rgb(80,0,80);font-size:12.8px">R_5, </span></div><div><span style="color:rgb(80,0,80);font-size:12.8px">L_0_</span><span style="color:rgb(80,0,80);font-size:12.8px">R_6, </span></div><div><span style="color:rgb(80,0,80);font-size:12.8px">L_0_</span><span style="color:rgb(80,0,80);font-size:12.8px">R_7,</span></div><div><span style="color:rgb(80,0,80);font-size:12.8px"><br></span></div><div><div><span style="color:rgb(80,0,80);font-size:12.8px">L_1_R_0, </span></div><div><span style="color:rgb(80,0,80);font-size:12.8px">L_1_</span><span style="color:rgb(80,0,80);font-size:12.8px">R_1, </span></div><div><span style="color:rgb(80,0,80);font-size:12.8px">L_1_</span><span style="color:rgb(80,0,80);font-size:12.8px">R_2, </span></div><div><span style="color:rgb(80,0,80);font-size:12.8px">L_1_</span><span style="color:rgb(80,0,80);font-size:12.8px">R_3, </span></div><div><span style="color:rgb(80,0,80);font-size:12.8px">L_1_</span><span style="color:rgb(80,0,80);font-size:12.8px">R_4, </span></div><div><span style="color:rgb(80,0,80);font-size:12.8px">L_1_</span><span style="color:rgb(80,0,80);font-size:12.8px">R_5, </span></div><div><span style="color:rgb(80,0,80);font-size:12.8px">L_1_</span><span style="color:rgb(80,0,80);font-size:12.8px">R_6, </span></div><div><span style="color:rgb(80,0,80);font-size:12.8px">L_1_</span><span style="color:rgb(80,0,80);font-size:12.8px">R_7,</span></div></div><div><span style="color:rgb(80,0,80);font-size:12.8px">...................</span></div><div><span style="color:rgb(80,0,80);font-size:12.8px"><br></span></div><div><div><span style="color:rgb(80,0,80);font-size:12.8px">L_31_R_0, </span></div><div><span style="color:rgb(80,0,80);font-size:12.8px">L_31_</span><span style="color:rgb(80,0,80);font-size:12.8px">R_1, </span></div><div><span style="color:rgb(80,0,80);font-size:12.8px">L_31_</span><span style="color:rgb(80,0,80);font-size:12.8px">R_2, </span></div><div><span style="color:rgb(80,0,80);font-size:12.8px">L_31_</span><span style="color:rgb(80,0,80);font-size:12.8px">R_3, </span></div><div><span style="color:rgb(80,0,80);font-size:12.8px">L_31_</span><span style="color:rgb(80,0,80);font-size:12.8px">R_4, </span></div><div><span style="color:rgb(80,0,80);font-size:12.8px">L_31_</span><span style="color:rgb(80,0,80);font-size:12.8px">R_5, </span></div><div><span style="color:rgb(80,0,80);font-size:12.8px">L_31_</span><span style="color:rgb(80,0,80);font-size:12.8px">R_6, </span></div><div><span style="color:rgb(80,0,80);font-size:12.8px">L_31_</span><span style="color:rgb(80,0,80);font-size:12.8px">R_7,</span></div></div><div><span style="color:rgb(80,0,80);font-size:12.8px"><br></span></div><div><span style="color:rgb(80,0,80);font-size:12.8px">Now when i assemble the vec sum code by my implemented instructions and default x86 scheduling & register allocation. it is only using L_0. But it should use all the lanes? how to achieve this.</span></div><div><span style="color:rgb(80,0,80);font-size:12.8px"><br></span></div><div><span style="color:rgb(80,0,80);font-size:12.8px">Something as follows:</span></div><div><span style="color:rgb(80,0,80);font-size:12.8px"><br></span></div><div><span style="color:rgb(80,0,80);font-size:12.8px">currently it is emitting as follows:</span></div><div><span style="color:rgb(80,0,80);font-size:12.8px"><br></span></div><div><div style="font-size:12.8px">P_2048B_LOAD_DWORD<span style="white-space:pre-wrap">    </span>L_0_R_0, Pword ptr [rip + b]</div><div style="font-size:12.8px"><span style="white-space:pre-wrap">  </span>P_2048B_LOAD_DWORD<span style="white-space:pre-wrap">      </span><span style="font-size:12.8px">L_0_</span><span style="font-size:12.8px">R_1, Pword ptr [rip + c]</span></div><div style="font-size:12.8px"><span style="white-space:pre-wrap">  </span>P_2048B_VADD<span style="white-space:pre-wrap">    </span><span style="font-size:12.8px">L_0_</span><span style="font-size:12.8px">R_0, </span><span style="font-size:12.8px">L_0_</span><span style="font-size:12.8px">R_1, </span><span style="font-size:12.8px">L_0_</span><span style="font-size:12.8px">R_0</span></div><div style="font-size:12.8px"><span style="white-space:pre-wrap">   </span>P_2048B_STORE_DWORD<span style="white-space:pre-wrap">     </span>Pword ptr [rip + a], <span style="font-size:12.8px">L_0_</span><span style="font-size:12.8px">R_0</span></div><div style="font-size:12.8px"><span style="white-space:pre-wrap"> </span>P_2048B_LOAD_DWORD<span style="white-space:pre-wrap">      </span><span style="font-size:12.8px">L_0_</span><span style="font-size:12.8px">R_0, Pword ptr [rip + b+2048]</span></div><div style="font-size:12.8px"><span style="white-space:pre-wrap">     </span>P_2048B_LOAD_DWORD<span style="white-space:pre-wrap">      </span><span style="font-size:12.8px">L_0_</span><span style="font-size:12.8px">R_1, Pword ptr [rip + c+2048]</span></div><div style="font-size:12.8px"><span style="white-space:pre-wrap">     </span>P_2048B_VADD<span style="white-space:pre-wrap">    </span><span style="font-size:12.8px">L_0_</span><span style="font-size:12.8px">R_0, </span><span style="font-size:12.8px">L_0_</span><span style="font-size:12.8px">R_1, </span><span style="font-size:12.8px">L_0_</span><span style="font-size:12.8px">R_0</span></div><div style="font-size:12.8px"><span style="white-space:pre-wrap">   </span>P_2048B_STORE_DWORD<span style="white-space:pre-wrap">     </span>Pword ptr [rip + a+2048], <span style="font-size:12.8px">L_0_</span><span style="font-size:12.8px">R_0</span></div><div style="font-size:12.8px"><span style="white-space:pre-wrap">    </span>P_2048B_LOAD_DWORD<span style="white-space:pre-wrap">      </span><span style="font-size:12.8px">L_0_</span><span style="font-size:12.8px">R_0, Pword ptr [rip + b+4096]</span></div><div style="font-size:12.8px"><span style="white-space:pre-wrap">     </span>P_2048B_LOAD_DWORD<span style="white-space:pre-wrap">      </span><span style="font-size:12.8px">L_0_</span><span style="font-size:12.8px">R_1, Pword ptr [rip + c+4096]</span></div><div style="font-size:12.8px"><span style="white-space:pre-wrap">     </span>P_2048B_VADD<span style="white-space:pre-wrap">    </span><span style="font-size:12.8px">L_0_</span><span style="font-size:12.8px">R_0, </span><span style="font-size:12.8px">L_0_</span><span style="font-size:12.8px">R_1, </span><span style="font-size:12.8px">L_0_</span><span style="font-size:12.8px">R_0</span></div><div style="font-size:12.8px"><span style="white-space:pre-wrap">   </span>P_2048B_STORE_DWORD<span style="white-space:pre-wrap">     </span>Pword ptr [rip + a+4096], <span style="font-size:12.8px">L_0_</span><span style="font-size:12.8px">R_0</span></div><div style="font-size:12.8px"><span style="white-space:pre-wrap">    </span>P_2048B_LOAD_DWORD<span style="white-space:pre-wrap">      </span><span style="font-size:12.8px">L_0_</span><span style="font-size:12.8px">R_0, Pword ptr [rip + b+6144]</span></div><div style="font-size:12.8px"><span style="white-space:pre-wrap">     </span>P_2048B_LOAD_DWORD<span style="white-space:pre-wrap">      </span><span style="font-size:12.8px">L_0_</span><span style="font-size:12.8px">R_1, Pword ptr [rip + c+6144]</span></div><div style="font-size:12.8px"><span style="white-space:pre-wrap">     </span>P_2048B_VADD<span style="white-space:pre-wrap">    </span><span style="font-size:12.8px">L_0_</span><span style="font-size:12.8px">R_0, </span><span style="font-size:12.8px">L_0_</span><span style="font-size:12.8px">R_1, </span><span style="font-size:12.8px">L_0_</span><span style="font-size:12.8px">R_0</span></div><div style="font-size:12.8px"><span style="white-space:pre-wrap">   </span>P_2048B_STORE_DWORD<span style="white-space:pre-wrap">     </span>Pword ptr [rip + a+6144], <span style="font-size:12.8px">L_0_</span><span style="font-size:12.8px">R_0</span></div></div><div><span style="color:rgb(80,0,80);font-size:12.8px"><br></span></div><div><span style="color:rgb(80,0,80);font-size:12.8px">It should emit as follows:</span></div><div><span style="color:rgb(80,0,80);font-size:12.8px"><br></span></div><div><div style="font-size:12.8px">P_2048B_LOAD_DWORD<span style="white-space:pre-wrap">   </span>L_0_R_0, Pword ptr [rip + b]</div><div style="font-size:12.8px"><span style="white-space:pre-wrap">  </span>P_2048B_LOAD_DWORD<span style="white-space:pre-wrap">      </span><span style="font-size:12.8px">L_0_</span><span style="font-size:12.8px">R_1, Pword ptr [rip + c]</span></div><div style="font-size:12.8px"><span style="white-space:pre-wrap">  </span>P_2048B_VADD<span style="white-space:pre-wrap">    </span><span style="font-size:12.8px">L_0_</span><span style="font-size:12.8px">R_0, </span><span style="font-size:12.8px">L_0_</span><span style="font-size:12.8px">R_1, </span><span style="font-size:12.8px">L_0_</span><span style="font-size:12.8px">R_0</span></div><div style="font-size:12.8px"><span style="white-space:pre-wrap">   </span>P_2048B_STORE_DWORD<span style="white-space:pre-wrap">     </span>Pword ptr [rip + a], <span style="font-size:12.8px">L_0_</span><span style="font-size:12.8px">R_0</span></div><div style="font-size:12.8px"><span style="white-space:pre-wrap"> </span>P_2048B_LOAD_DWORD<span style="white-space:pre-wrap">      </span><span style="font-size:12.8px">L_1_</span><span style="font-size:12.8px">R_0, Pword ptr [rip + b+2048]</span></div><div style="font-size:12.8px"><span style="white-space:pre-wrap">     </span>P_2048B_LOAD_DWORD<span style="white-space:pre-wrap">      </span><span style="font-size:12.8px">L_1_</span><span style="font-size:12.8px">R_1, Pword ptr [rip + c+2048]</span></div><div style="font-size:12.8px"><span style="white-space:pre-wrap">     </span>P_2048B_VADD<span style="white-space:pre-wrap">    </span><span style="font-size:12.8px">L_1_</span><span style="font-size:12.8px">R_0, </span><span style="font-size:12.8px">L_1_</span><span style="font-size:12.8px">R_1, </span><span style="font-size:12.8px">L_1_</span><span style="font-size:12.8px">R_0</span></div><div style="font-size:12.8px"><span style="white-space:pre-wrap">   </span>P_2048B_STORE_DWORD<span style="white-space:pre-wrap">     </span>Pword ptr [rip + a+2048], <span style="font-size:12.8px">L_1_</span><span style="font-size:12.8px">R_0</span></div><div style="font-size:12.8px"><span style="white-space:pre-wrap">    </span>P_2048B_LOAD_DWORD<span style="white-space:pre-wrap">      </span><span style="font-size:12.8px">L_2_</span><span style="font-size:12.8px">R_0, Pword ptr [rip + b+4096]</span></div><div style="font-size:12.8px"><span style="white-space:pre-wrap">     </span>P_2048B_LOAD_DWORD<span style="white-space:pre-wrap">      </span><span style="font-size:12.8px">L_2_</span><span style="font-size:12.8px">R_1, Pword ptr [rip + c+4096]</span></div><div style="font-size:12.8px"><span style="white-space:pre-wrap">     </span>P_2048B_VADD<span style="white-space:pre-wrap">    </span><span style="font-size:12.8px">L_2_</span><span style="font-size:12.8px">R_0, </span><span style="font-size:12.8px">L_2_</span><span style="font-size:12.8px">R_1, </span><span style="font-size:12.8px">L_2_</span><span style="font-size:12.8px">R_0</span></div><div style="font-size:12.8px"><span style="white-space:pre-wrap">   </span>P_2048B_STORE_DWORD<span style="white-space:pre-wrap">     </span>Pword ptr [rip + a+4096], <span style="font-size:12.8px">L_2_</span><span style="font-size:12.8px">R_0</span></div><div style="font-size:12.8px"><span style="white-space:pre-wrap">    </span>P_2048B_LOAD_DWORD<span style="white-space:pre-wrap">      </span><span style="font-size:12.8px">L_3_</span><span style="font-size:12.8px">R_0, Pword ptr [rip + b+6144]</span></div><div style="font-size:12.8px"><span style="white-space:pre-wrap">     </span>P_2048B_LOAD_DWORD<span style="white-space:pre-wrap">      </span><span style="font-size:12.8px">L_3_</span><span style="font-size:12.8px">R_1, Pword ptr [rip + c+6144]</span></div><div style="font-size:12.8px"><span style="white-space:pre-wrap">     </span>P_2048B_VADD<span style="white-space:pre-wrap">    </span><span style="font-size:12.8px">L_3_</span><span style="font-size:12.8px">R_0, </span><span style="font-size:12.8px">L_3_</span><span style="font-size:12.8px">R_1, </span><span style="font-size:12.8px">L_3_</span><span style="font-size:12.8px">R_0</span></div><div style="font-size:12.8px"><span style="white-space:pre-wrap">   </span>P_2048B_STORE_DWORD<span style="white-space:pre-wrap">     </span>Pword ptr [rip + a+6144], <span style="font-size:12.8px">L_3_</span><span style="font-size:12.8px">R_0</span></div></div><div><span style="color:rgb(80,0,80);font-size:12.8px"><br></span></div><div><span style="color:rgb(80,0,80);font-size:12.8px">does it involve changing the register live intervals? or scheduling?</span></div><div><span style="color:rgb(80,0,80);font-size:12.8px"><br></span></div><div><span style="color:rgb(80,0,80);font-size:12.8px">please help. i am trying hard but unable to solve this.</span></div><div><span style="color:rgb(80,0,80);font-size:12.8px"><br></span></div><div><span style="color:rgb(80,0,80);font-size:12.8px"><br></span></div><div><span style="color:rgb(80,0,80);font-size:12.8px"><br></span></div><div><span style="color:rgb(80,0,80);font-size:12.8px"><br></span></div><div><span style="color:rgb(80,0,80);font-size:12.8px"><br></span></div></div><div class="gmail_extra"><br><div class="gmail_quote">On Sun, Aug 27, 2017 at 12:31 AM, Tim Northover <span dir="ltr"><<a href="mailto:t.p.northover@gmail.com" target="_blank">t.p.northover@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class="">On 26 August 2017 at 11:14, hameeza ahmed <<a href="mailto:hahmed2305@gmail.com">hahmed2305@gmail.com</a>> wrote:<br>
> Hello,<br>
><br>
> I have defined 8 registers in <a href="http://registerinfo.td" rel="noreferrer" target="_blank">registerinfo.td</a> file in the following order:<br>
> R_0, R_1, R_2, R_3, R_4, R_5, R_6, R_7<br>
><br>
> But the generated assembly code only uses 2 registers. How to enable it to<br>
> use all 8?<br>
<br>
</span>What are your thoughts on what might be the issue? Have you considered<br>
the advantages and disadvantages of using multiple registers for the<br>
code you're testing?<br>
<br>
Cheers.<br>
<span class="HOEnZb"><font color="#888888"><br>
Tim.<br>
</font></span></blockquote></div><br></div>