[cfe-dev] Floating-point performance question

Richard Hadsell hadsell at blueskystudios.com
Thu Sep 5 13:44:20 PDT 2013


On 09/05/2013 04:26 PM, Halfdan Ingvarsson wrote:
> Same applies to exp2f, btw, since they have fairly very similar implementation.
>
>  - ½
>
> On 13-09-05 03:55 PM, Halfdan Ingvarsson wrote:
>> glibc's expf() function changes the FP rounding mode on every call -- which are the fe* calls you're seeing -- resulting in a dreadful performance (IIRC there's a pipeline stall when rounding mode changes).
>>
>> Have a look at sysdeps/ieee754/flt-32/e_expf.c in the glibc sources to verify. This is true as of glibc 2.14, at least.
>>
>> We had to roll our own to work around it.
>>
>>  - ½
>>
>> On 13-09-05 03:33 PM, Stephen Canon wrote:
>>> On Sep 5, 2013, at 12:20 PM, Eli Friedman <eli.friedman at gmail.com <mailto:eli.friedman at gmail.com>> wrote:
>>>
>>>> On Thu, Sep 5, 2013 at 12:15 PM, Richard Hadsell <hadsell at blueskystudios.com <mailto:hadsell at blueskystudios.com>> wrote:
>>>>
>>>>     We have been comparing the performance of code generated by Clang++ 3.3 with G++ 4.5.1.  The results have been mixed.
>>>>
>>>>     We ran a profiler to look for what could cause some cases to run slower with Clang++ and found that some floating-point routines were taking a lot of time:
>>>>
>>>>     samples  %        image name     symbol name
>>>>     596677   19.7935  studio++       gcopy2
>>>>     274870    9.1182 libm-2.13.so <http://libm-2.13.so/>   feholdexcept
>>>>     262358    8.7032 libm-2.13.so <http://libm-2.13.so/>   fesetenv
>>>>     258225    8.5661  studio++       cgi...
>>>>     207915    6.8971 libm-2.13.so <http://libm-2.13.so/>   fesetround
>>>>     193316    6.4129  studio++       dcopy2
>>>>     126933 4.2107 <tel:126933%20%20%20%204.2107> libm-2.13.so <http://libm-2.13.so/>   __ieee754_exp2
>>>>     122614    4.0675  studio++       fcopy2
>>>>
>>>>     For g++ the top contributors were these:
>>>>
>>>>     samples  %        image name     symbol name
>>>>     466893   21.3064  studio++       gcopy2
>>>>     300240   13.7013  studio++       cgi...
>>>>     176191    8.0404  studio++       dcopy2
>>>>     132491    6.0462  studio++       cgi...
>>>>     129580    5.9133 libm-2.13.so <http://libm-2.13.so/>   __ieee754_pow
>>>>     126938 5.7928 <tel:126938%20%20%20%205.7928>  studio++       ecopy2
>>>>     119610    5.4583  studio++       fcopy2
>>>>
>>>>     The libm floating-point routines 'fe...' only show up with Clang++, so I suspect they account for the slower performance.
>>>>
>>>>     We are not purposely changing the floating-point precision or rounding mode, so I am looking for a way to avoid code that uses these functions unnecessarily.
>>>>
>>>>     We are compiling with these options:
>>>>
>>>>     -march=core2 -msse4.1 -m64 -std=c++0x -fPIC -pthread -gcc-toolchain /opt/gcc-4.7.2 -Wno-logical-op-parentheses -Wno-shift-op-parentheses -O2
>>>>
>>>>
>>>> There isn't any obvious reason why feholdexcept etc. would be called from clang-compiled code, but not gcc-compiled code; clang never generates calls to it implicitly.
>>>>
>>>> Can you hop into a debugger and get a stack trace from a call to feholdexcept?
>>>
>>> Usually the reason these symbols show up on linux is that you're hitting the errno-versions of the libm entry points (i.e. GCC is likely generating calls to a different set of more streamlined libm entry points, while clang is hitting the default 
>>> versions).
>>>
>>>
>
Thanks for all the clues.  Here is the stack trace:

  feholdexcept,
  __ieee754_exp2,
  exp2,
  _ZN9cgi...

Based on your various hints, I'm guessing that our code 'pow (2.0, x)' is being optimized by Clang++ to 'exp2 (x)' and not by G++.  We will try using exp2 explicitly and see what happens with the G++ version.

Perhaps we are running into a floating-point standards issue that our old version of G++ is ignoring.

We'll continue investigating tomorrow.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20130905/736670ef/attachment.html>


More information about the cfe-dev mailing list