<html><head><meta http-equiv="Content-Type" content="text/html; charset=utf-8"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class="">Hi Dominik,<div class=""><br class=""></div><div class="">Thanks for sending this!<br class=""><div><br class=""><blockquote type="cite" class=""><div class="">On Oct 7, 2020, at 5:21 AM, Dominik Montada via llvm-dev <<a href="mailto:llvm-dev@lists.llvm.org" class="">llvm-dev@lists.llvm.org</a>> wrote:</div><br class="Apple-interchange-newline"><div class=""><div class="">Hi all,<br class=""><br class="">this is the second email for the round table follow-up, this time regarding the issues around the greedy RegBankSelect and alternative mappings.<br class=""><br class="">The issue I brought up was that because RegBankSelect goes top-down, it never looks at all available mappings for the operands when considering which of the mappings to apply to the current instruction. In our architecture we have one register bank dedicated to pointers and another one for anything else. We often see code where we have a G_PTR_ADD with a constant. Since the constant is not a pointer, we put it on the other register bank. We could put it on the address regbank and do provide alternative mappings for that, but since the greedy algorithm doesn't actually check the usage of the constant, it is always put on the other bank.<br class=""></div></div></blockquote><div><br class=""></div><div>The intent behind the greedy algorithm was for the mapping to look at X instructions ahead, depending on the optimization level, when assigning one instruction.</div><div>Right now the window is simply 1 instruction and the code is not structured to allow to have more than one.</div><div><br class=""></div><div>As we gain more insights on what we want RegBankSelect to do, it makes sense to redesign it. </div><br class=""><blockquote type="cite" class=""><div class=""><div class=""><br class="">When RegBankSelect then sees the G_PTR_ADD and sees that one of its inputs is on the other register bank already, it then inserts a costly cross-bank copy instead of checking if that operand has any alternative mappings which would make the overall mapping for the current instruction cheaper.<br class=""></div></div></blockquote><div><br class=""></div><div>The idea is when a mapping is done, that means it was the best one at the time of the decision (greedy), thus we don’t challenge that.</div><div>“Reverting” an already set mapping is not that simple because yes for this particular use we are inserting costly cross copies, but there are no guarantees that replacing the mapping of the definition will not insert even more costly copies for the other uses.</div><div><br class=""></div><div>E.g., consider:</div><div>```</div><div><font face="Menlo" class="">A = def <— RBS starts here</font></div><div><font face="Menlo" class="">= useFP A</font></div><div><font face="Menlo" class="">= useInt A</font></div><div><font face="Menlo" class="">= useInt A</font></div><div>```</div><div><br class=""></div><div>Let’s assume that greedy works the way it is intended. I.e., it assigns A to the int register bank because there are 2 such uses vs. only 1 fp bank use:</div><div><div>```</div><div><font face="Menlo" class="">A<int> = def</font></div><div><font face="Menlo" class="">= useFP A<int> <— Next, RBS looks at this one</font></div><div><font face="Menlo" class="">= useInt A<int></font></div><div><font face="Menlo" class="">= useInt A<int></font></div><div class="">```</div></div><div><br class=""></div>Now the regbank for useFP is not right and has to be repaired. Right now, we will insert the costly cross copy for that use:</div><div><div><div>```</div><div><font face="Menlo" class="">A<int> = def</font></div><div><font face="Menlo" class="">AFP<fp> = cross_copy A<int></font></div><div><font face="Menlo" class="">= useFP AFP<fp></font></div><div><font face="Menlo" class="">= useInt A<int></font></div><div><font face="Menlo" class="">= useInt A<int></font></div><div class="">```</div><div class=""><br class=""></div><div class="">Now, if we were to change the definition of A to avoid this copy we would create two costly copy for the useInt. Actually, another question is what would we do when we look at the first useInt? Is this use allowed to change the definition of the instruction again? Do we duplicate the definition? Etc..</div><div class=""><div>```</div><div><font face="Menlo" class="">A<fp> = def <— reassign</font></div><div><span style="font-family: Menlo;" class="">= useFP AFP<fp></span></div><div><font face="Menlo" class="">AInt1<int> = cross_copy A<fp></font></div><div><font face="Menlo" class="">= useInt AInt1<int></font></div><div><font face="Menlo" class="">AInt2<int> = cross_copy A<fp></font></div><div><font face="Menlo" class="">= useInt AInt2<int></font></div><div class="">```</div></div><div class=""><br class=""></div><div class="">Bottom line, if we allow to modify the assignments of the definition, the decision making is not local anymore and in particular may require to add repairing code all over the place. As a result the cost model becomes much more complicated.</div><div class=""><br class=""></div></div><blockquote type="cite" class=""><div class=""><div class=""><br class="">Matt suggested that RegBankSelect should probably go bottom-up instead and I agree with him. I don't think there is a particular reason why RegBankSelect necessarily has to go top-down.<br class=""></div></div></blockquote><div><br class=""></div><div>The rationale for going top-down is that when you reach an instruction, all the operands are assigned so you know what would be the cost of repairing.</div><div>You could said that the problem is the same for definitions when going bottom-up. However, this is likely to be more problematic because usually you have fewer definitions than arguments on each individual instruction, therefore there is more guess work going on (e.g., top-down, you assume a cost for 1 definition, going bottom-up you have to assume a cost for 2 arguments or more precisely you would have to track a window of X instructions for 2 arguments instead of 1 definition.)</div><br class=""><blockquote type="cite" class=""><div class=""><div class=""><br class="">I'm not too familiar with the implementation of RegBankSelect. Would it be a big effort to make it work bottom-up instead? I'm guessing one of the biggest areas would be the check whether a cross-bank copy is needed as well as calculating the overall cost for alternative mappings as now all usages of the current instruction would have to be checked instead of the much more limited operands. How big of an impact would this have?<br class=""></div></div></blockquote><div><br class=""></div><div>It’s been a while since I looked at the implementation but I would expect this to be significant.</div><div><br class=""></div><div>I think we should step back and check what we want before investing any time in some rewrite. For instance, I don’t see what bottom-up fundamentally gives us. It seems like a workaround to me.</div><div>That said, it would work either way!</div><div><br class=""></div><div>Cheers,</div><div>-Quentin</div><br class=""><blockquote type="cite" class=""><div class=""><div class=""><br class="">Cheers,<br class=""><br class="">Dominik<br class=""><br class="">_______________________________________________<br class="">LLVM Developers mailing list<br class=""><a href="mailto:llvm-dev@lists.llvm.org" class="">llvm-dev@lists.llvm.org</a><br class="">https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev<br class=""></div></div></blockquote></div><br class=""></div></body></html>