<html><head></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; "><br><div><div>On Jan 3, 2011, at 11:30 PM, Jakob Stoklund Olesen wrote:</div><br class="Apple-interchange-newline"><blockquote type="cite"><div>I noticed that we generate code like this for i386 PIC:<br><br><span class="Apple-tab-span" style="white-space:pre">    </span>calll<span class="Apple-tab-span" style="white-space:pre">       </span>L0$pb<br>L0$pb:<br><span class="Apple-tab-span" style="white-space:pre">     </span>popl<span class="Apple-tab-span" style="white-space:pre">        </span>%eax<br><span class="Apple-tab-span" style="white-space:pre">      </span>movl<span class="Apple-tab-span" style="white-space:pre">        </span>%eax, -24(%ebp)         ## 4-byte Spill<br><br>I worry that this defeats the return address prediction for returns in the function because calls and returns no longer are matched.<br></div></blockquote></div><br><div>Yes, this will defeat the processor's return address stack predictor.  That said, I suspect it's not much of an issue on "desktop" processors: the reissue of the pop is an Atom-specific issue, so you only need to worry about the branch misprediction caused on the next return.  Assuming these sequences aren't<i> too</i> frequent, the more elaborate tournament predictors in more powerful processors may be able to compensate for it.</div><div><br></div><div>That said, the alternative sequence you propose seems like it would be an improvement on any processor with a multiple issue pipeline (unless ret does a lot more work than I think it does), though it doesn't fix the reissued pop problem on Atom.</div><div><br></div><div>--Owen</div></body></html>