[cfe-dev] Floating-point performance question
Halfdan Ingvarsson
halfdan at sidefx.com
Thu Sep 5 13:26:45 PDT 2013
Same applies to exp2f, btw, since they have fairly very similar
implementation.
- ½
On 13-09-05 03:55 PM, Halfdan Ingvarsson wrote:
> glibc's expf() function changes the FP rounding mode on every call --
> which are the fe* calls you're seeing -- resulting in a dreadful
> performance (IIRC there's a pipeline stall when rounding mode changes).
>
> Have a look at sysdeps/ieee754/flt-32/e_expf.c in the glibc sources to
> verify. This is true as of glibc 2.14, at least.
>
> We had to roll our own to work around it.
>
> - ½
>
> On 13-09-05 03:33 PM, Stephen Canon wrote:
>> On Sep 5, 2013, at 12:20 PM, Eli Friedman <eli.friedman at gmail.com
>> <mailto:eli.friedman at gmail.com>> wrote:
>>
>>> On Thu, Sep 5, 2013 at 12:15 PM, Richard Hadsell
>>> <hadsell at blueskystudios.com <mailto:hadsell at blueskystudios.com>> wrote:
>>>
>>> We have been comparing the performance of code generated by
>>> Clang++ 3.3 with G++ 4.5.1. The results have been mixed.
>>>
>>> We ran a profiler to look for what could cause some cases to run
>>> slower with Clang++ and found that some floating-point routines
>>> were taking a lot of time:
>>>
>>> samples % image name symbol name
>>> 596677 19.7935 studio++ gcopy2
>>> 274870 9.1182 libm-2.13.so <http://libm-2.13.so/> feholdexcept
>>> 262358 8.7032 libm-2.13.so <http://libm-2.13.so/> fesetenv
>>> 258225 8.5661 studio++ cgi...
>>> 207915 6.8971 libm-2.13.so <http://libm-2.13.so/> fesetround
>>> 193316 6.4129 studio++ dcopy2
>>> 126933 4.2107 <tel:126933%20%20%20%204.2107> libm-2.13.so
>>> <http://libm-2.13.so/> __ieee754_exp2
>>> 122614 4.0675 studio++ fcopy2
>>>
>>> For g++ the top contributors were these:
>>>
>>> samples % image name symbol name
>>> 466893 21.3064 studio++ gcopy2
>>> 300240 13.7013 studio++ cgi...
>>> 176191 8.0404 studio++ dcopy2
>>> 132491 6.0462 studio++ cgi...
>>> 129580 5.9133 libm-2.13.so <http://libm-2.13.so/> __ieee754_pow
>>> 126938 5.7928 <tel:126938%20%20%20%205.7928> studio++ ecopy2
>>> 119610 5.4583 studio++ fcopy2
>>>
>>> The libm floating-point routines 'fe...' only show up with
>>> Clang++, so I suspect they account for the slower performance.
>>>
>>> We are not purposely changing the floating-point precision or
>>> rounding mode, so I am looking for a way to avoid code that uses
>>> these functions unnecessarily.
>>>
>>> We are compiling with these options:
>>>
>>> -march=core2 -msse4.1 -m64 -std=c++0x -fPIC -pthread
>>> -gcc-toolchain /opt/gcc-4.7.2 -Wno-logical-op-parentheses
>>> -Wno-shift-op-parentheses -O2
>>>
>>>
>>> There isn't any obvious reason why feholdexcept etc. would be called
>>> from clang-compiled code, but not gcc-compiled code; clang never
>>> generates calls to it implicitly.
>>>
>>> Can you hop into a debugger and get a stack trace from a call to
>>> feholdexcept?
>>
>> Usually the reason these symbols show up on linux is that you're
>> hitting the errno-versions of the libm entry points (i.e. GCC is
>> likely generating calls to a different set of more streamlined libm
>> entry points, while clang is hitting the default versions).
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20130905/29966a3b/attachment.html>
More information about the cfe-dev
mailing list