<div dir="ltr">On Thu, Sep 5, 2013 at 12:15 PM, Richard Hadsell <span dir="ltr"><<a href="mailto:hadsell@blueskystudios.com" target="_blank">hadsell@blueskystudios.com</a>></span> wrote:<br><div class="gmail_extra"><div class="gmail_quote">

<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">We have been comparing the performance of code generated by Clang++ 3.3 with G++ 4.5.1.  The results have been mixed.<br>


<br>

We ran a profiler to look for what could cause some cases to run slower with Clang++ and found that some floating-point routines were taking a lot of time:<br>

<br>

samples  %        image name     symbol name<br>

596677   19.7935  studio++       gcopy2<br>

274870    9.1182  <a href="http://libm-2.13.so" target="_blank">libm-2.13.so</a>   feholdexcept<br>

262358    8.7032  <a href="http://libm-2.13.so" target="_blank">libm-2.13.so</a>   fesetenv<br>

258225    8.5661  studio++       cgi...<br>

207915    6.8971  <a href="http://libm-2.13.so" target="_blank">libm-2.13.so</a>   fesetround<br>

193316    6.4129  studio++       dcopy2<br>

<a href="tel:126933%20%20%20%204.2107" value="+12693342107" target="_blank">126933    4.2107</a>  <a href="http://libm-2.13.so" target="_blank">libm-2.13.so</a>   __ieee754_exp2<br>

122614    4.0675  studio++       fcopy2<br>

<br>

For g++ the top contributors were these:<br>

<br>

samples  %        image name     symbol name<br>

466893   21.3064  studio++       gcopy2<br>

300240   13.7013  studio++       cgi...<br>

176191    8.0404  studio++       dcopy2<br>

132491    6.0462  studio++       cgi...<br>

129580    5.9133  <a href="http://libm-2.13.so" target="_blank">libm-2.13.so</a>   __ieee754_pow<br>

<a href="tel:126938%20%20%20%205.7928" value="+12693857928" target="_blank">126938    5.7928</a>  studio++       ecopy2<br>

119610    5.4583  studio++       fcopy2<br>

<br>

The libm floating-point routines 'fe...' only show up with Clang++, so I suspect they account for the slower performance.<br>

<br>

We are not purposely changing the floating-point precision or rounding mode, so I am looking for a way to avoid code that uses these functions unnecessarily.<br>

<br>

We are compiling with these options:<br>

<br>

-march=core2 -msse4.1 -m64 -std=c++0x -fPIC -pthread -gcc-toolchain /opt/gcc-4.7.2 -Wno-logical-op-parentheses -Wno-shift-op-parentheses -O2<span class=""><font color="#888888"><br>

<br></font></span></blockquote><div><br></div><div>There isn't any obvious reason why feholdexcept etc. would be called from clang-compiled code, but not gcc-compiled code; clang never generates calls to it implicitly.</div>

<div><br></div><div>Can you hop into a debugger and get a stack trace from a call to feholdexcept?</div></div><br></div><div class="gmail_extra">-Eli</div></div>