<div dir="ltr">Hi Quentin,<br><div class="gmail_extra"><br clear="all"><div>2014-10-02 0:21 GMT+07:00 Quentin Colombet <span dir="ltr"><<a href="mailto:qcolombet@apple.com" target="_blank">qcolombet@apple.com</a>></span>:<br></div><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div style="word-wrap:break-word"><div><div><span class=""><blockquote type="cite"><div><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div style="word-wrap:break-word"><div><div><div><div><div><div><blockquote type="cite"><div><span style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;line-height:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;float:none;display:inline!important">The constant hoisting pass does this kind of things. Should we try to teach it to handle this kind of cases?</span></div></blockquote></div></div></div></div></div></div></div></blockquote><div>That would be interesting. However this pass is x86 specific and can use processor features (subregister structure, loading 64-bit value with 32-bit move). Can theses features be used by constant hoisting? </div></div></div></div></div></blockquote><div><br></div></span><div>Maybe. This pass has a bunch of target hooks if I remember correctly. Juergen would know better :).</div></div></div></div></blockquote><div><br></div><div>Ok, looking forward to advice:)</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div style="word-wrap:break-word"><div><div><span class=""><blockquote type="cite"><div><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div style="word-wrap:break-word"><div><div><div><div><div><div><blockquote type="cite"><div><span style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;line-height:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;float:none;display:inline!important">Moreover, this may be beneficial for code size, but I guess it is generally not beneficial for performances. Therefore, I believe this should be done for functions with the Os or Oz attributes only.</span></div></blockquote></div></div></div></div></div></div></div></blockquote><div>Just curious, why? Moves from register must be faster than move from memory.</div></div></div></div></div></blockquote><div><br></div></span><div>Yes, but those are moves from immediate, which does not require memory at all.</div></div></div></div></blockquote><div><br></div><div>On the other hand it loads memory bus.</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div style="word-wrap:break-word"><div><div><div>My performance concerns are:</div><div>- Register pressure, like Rafael mentioned.</div><div>- Additional scheduling dependencies.</div><div><br></div><div>Going back to your example:</div><div>This yields two independent chain of computation that can be scheduled independently. Moreover, you need just one register to realize this sequence.</div><div><span class="">  mov $0, 0x4(%esi)<br>   mov $0, 0x8(%esi)<br><br></span>The two sequences of computations have now to wait for the first mov immediate. Moreover, this sequence requires 2 registers.<span class=""><br>   mov $0, %eax<br>   mov %eax, 0x4(%esi)<br>   mov %eax, 0x8(%esi)</span></div><div><br></div></div></div></div></blockquote><div><br></div><div>Looks reasonable, thank you for explanation.</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div style="word-wrap:break-word"><div><div><div></div><span class=""><br><blockquote type="cite"><div><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div> Both gcc and icc use moves from register when compiling with optimization.  </div></div></div></div></div></blockquote><div><br></div></span><div>Sure. What I am saying is that generally speaking, trading an immediate to register copy against a register to register copy does not sound like beneficial to me. </div><div><br></div><div>Except from code size improvements, what kind of improvements are you seeing?</div></div></div></div></blockquote><div> </div><div>The main goal was code size. In fact the main interest is optimization of memset, the problem is described in <a href="http://llvm.org/bugs/show_bug.cgi?id=5124">http://llvm.org/bugs/show_bug.cgi?id=5124</a>.  </div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div style="word-wrap:break-word"><div><div><div>Also, how big are those improvements?</div><div><br></div></div></div></div></blockquote><div><br></div><div>Compilation of PHP distribution with and without this pass shows size reduction about 0.3%.</div><div><br></div><div><br></div><div> --Serge</div></div></div></div>