<html><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; ">[ merging two email replies to make it easier to follow this thread.] <br><div><div>On Sep 26, 2008, at 7:36 PM, Duncan Sands wrote:</div><br class="Apple-interchange-newline"><blockquote type="cite"><div>Hi Devang,<br><br><blockquote type="cite"><blockquote type="cite"><blockquote type="cite">If XYZ calls S and NS then once again, XYZ's notes win.<br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite">And could result in a huge performance loss.  And it is a loss:<br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite">it was ok to run S using sse instructions (that's why the<br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite">function was marked "sse"!), but now sse isn't being used due<br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite">to inlining...<br></blockquote></blockquote><blockquote type="cite"><br></blockquote><blockquote type="cite">... this happens only if because XYZ is marked as x86.no-sse. In which  <br></blockquote><blockquote type="cite">case, it is not a performance loss at all.<br></blockquote><br>I don't understand what you are saying here.  Suppose XYZ is no-sse.<br>It calls S which is marked sse and does a lot of floating point<br>computation (but doesn't use sse intrinsics).  If I understand you<br>right, the inliner can inline S into XYZ.  </div></blockquote><div><br></div><div>I think you misunderstood ...</div><div><br></div><div>"So, inline S into ...<b> only if</b> code generator will not be forced to use SSE instructions for the code copied from S."  Here "only if" is important :)</div><div><br></div><div>Later I mentioned, "The inliner needs to know the LLVM IR for function S does not use SSE intrinsics in this case. The inliner needs to detect SSE uses at IR level."</div><div><br></div><div>If the inliner can not detect this or decides to not detect this then it should not inline S into XYZ in this case. It is obvious.</div><div><br></div><div>On Sep 26, 2008, at 7:48 PM, Duncan Sands wrote:</div><div><br class="Apple-interchange-newline"><blockquote type="cite"><span class="Apple-style-span" style="color: rgb(0, 0, 0); ">I'm talking about this case:<br> gcc -c -O4 -no-sse x.c <= sse explicitly turned off<br> gcc -c -O4 -sse y.c<br> gcc -o x x.o y.o<br>Here you would still happily inline B into A, while my<br>scheme would not.  <br></span></blockquote><div><br></div>No, you misunderstood my schema. See above.</div><div><br></div><div>We have extensively supported scenario, where people use runtime checks to run special optimized routines for certain processors. (G3 vs. Altivec code). It is ok if the inliner inlines non-altivec code into a specilized altvec routine. However, inlining function that uses altivec instructions into a function that is expected to run on G3 is a bad idea. Follow uses_vector in llvm-gcc's gcc inliner code. We have regularly received requests for specialized routines for processors, where appropriate one is selected at runtime, in x86 world. I'm told that ICC supports this.</div><div><br></div><div>The function attributes (notes are now implemented as  attributes) must be handled case by case. We should not put vanilla check in inliner that says, if attributes do not match then skip. If we support optspeed, optimize for speed, then optspeed vs optsize makes this obvious.</div><div>-</div><div>Devang</div><div><br></div><br><blockquote type="cite"><div>This results in all these<br>floating point computations being done as "no-sse", i.e. using the<br>good 'ol x86 floating point stack rather than the much more efficient<br>sse registers...<br><br>Ciao,<br><br>Duncan.<br></div></blockquote></div><br></body></html>