[llvm-commits] r55638 - /llvm/trunk/include/llvm/Function.h
dpatel at apple.com
Mon Sep 29 09:19:35 PDT 2008
[ merging two email replies to make it easier to follow this thread.]
On Sep 26, 2008, at 7:36 PM, Duncan Sands wrote:
> Hi Devang,
>>>> If XYZ calls S and NS then once again, XYZ's notes win.
>>> And could result in a huge performance loss. And it is a loss:
>>> it was ok to run S using sse instructions (that's why the
>>> function was marked "sse"!), but now sse isn't being used due
>>> to inlining...
>> ... this happens only if because XYZ is marked as x86.no-sse. In
>> case, it is not a performance loss at all.
> I don't understand what you are saying here. Suppose XYZ is no-sse.
> It calls S which is marked sse and does a lot of floating point
> computation (but doesn't use sse intrinsics). If I understand you
> right, the inliner can inline S into XYZ.
I think you misunderstood ...
"So, inline S into ... only if code generator will not be forced to
use SSE instructions for the code copied from S." Here "only if" is
Later I mentioned, "The inliner needs to know the LLVM IR for function
S does not use SSE intrinsics in this case. The inliner needs to
detect SSE uses at IR level."
If the inliner can not detect this or decides to not detect this then
it should not inline S into XYZ in this case. It is obvious.
On Sep 26, 2008, at 7:48 PM, Duncan Sands wrote:
> I'm talking about this case:
> gcc -c -O4 -no-sse x.c <= sse explicitly turned off
> gcc -c -O4 -sse y.c
> gcc -o x x.o y.o
> Here you would still happily inline B into A, while my
> scheme would not.
No, you misunderstood my schema. See above.
We have extensively supported scenario, where people use runtime
checks to run special optimized routines for certain processors. (G3
vs. Altivec code). It is ok if the inliner inlines non-altivec code
into a specilized altvec routine. However, inlining function that uses
altivec instructions into a function that is expected to run on G3 is
a bad idea. Follow uses_vector in llvm-gcc's gcc inliner code. We have
regularly received requests for specialized routines for processors,
where appropriate one is selected at runtime, in x86 world. I'm told
that ICC supports this.
The function attributes (notes are now implemented as attributes)
must be handled case by case. We should not put vanilla check in
inliner that says, if attributes do not match then skip. If we support
optspeed, optimize for speed, then optspeed vs optsize makes this
> This results in all these
> floating point computations being done as "no-sse", i.e. using the
> good 'ol x86 floating point stack rather than the much more efficient
> sse registers...
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the llvm-commits