<html>
<head>
<meta content="text/html; charset=ISO-8859-1"
http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
On 09/05/2013 04:26 PM, Halfdan Ingvarsson wrote:<br>
<blockquote cite="mid:5228E905.3030003@sidefx.com" type="cite">
<meta content="text/html; charset=ISO-8859-1"
http-equiv="Content-Type">
<div class="moz-cite-prefix">Same applies to exp2f, btw, since
they have fairly very similar implementation.<br>
<br>
- ½<br>
<br>
On 13-09-05 03:55 PM, Halfdan Ingvarsson wrote:<br>
</div>
<blockquote cite="mid:5228E1A4.9040300@sidefx.com" type="cite">
<meta content="text/html; charset=ISO-8859-1"
http-equiv="Content-Type">
<div class="moz-cite-prefix">glibc's expf() function changes the
FP rounding mode on every call -- which are the fe* calls
you're seeing -- resulting in a dreadful performance (IIRC
there's a pipeline stall when rounding mode changes).<br>
<br>
Have a look at sysdeps/ieee754/flt-32/e_expf.c in the glibc
sources to verify. This is true as of glibc 2.14, at least.<br>
<br>
We had to roll our own to work around it.<br>
<br>
- ½<br>
<br>
On 13-09-05 03:33 PM, Stephen Canon wrote:<br>
</div>
<blockquote
cite="mid:894741D6-06A5-473D-883F-083548EAED9D@apple.com"
type="cite">
<meta http-equiv="Content-Type" content="text/html;
charset=ISO-8859-1">
<div>On Sep 5, 2013, at 12:20 PM, Eli Friedman <<a
moz-do-not-send="true"
href="mailto:eli.friedman@gmail.com">eli.friedman@gmail.com</a>>
wrote:</div>
<div><br class="Apple-interchange-newline">
<blockquote type="cite">
<div dir="ltr">On Thu, Sep 5, 2013 at 12:15 PM, Richard
Hadsell <span dir="ltr"><<a moz-do-not-send="true"
href="mailto:hadsell@blueskystudios.com"
target="_blank">hadsell@blueskystudios.com</a>></span>
wrote:<br>
<div class="gmail_extra">
<div class="gmail_quote">
<blockquote class="gmail_quote" style="margin:0px
0px 0px
0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">We
have been comparing the performance of code
generated by Clang++ 3.3 with G++ 4.5.1. The
results have been mixed.<br>
<br>
We ran a profiler to look for what could cause
some cases to run slower with Clang++ and found
that some floating-point routines were taking a
lot of time:<br>
<br>
samples % image name symbol name<br>
596677 19.7935 studio++ gcopy2<br>
274870 9.1182 <a moz-do-not-send="true"
href="http://libm-2.13.so/" target="_blank">libm-2.13.so</a>
feholdexcept<br>
262358 8.7032 <a moz-do-not-send="true"
href="http://libm-2.13.so/" target="_blank">libm-2.13.so</a>
fesetenv<br>
258225 8.5661 studio++ cgi...<br>
207915 6.8971 <a moz-do-not-send="true"
href="http://libm-2.13.so/" target="_blank">libm-2.13.so</a>
fesetround<br>
193316 6.4129 studio++ dcopy2<br>
<a moz-do-not-send="true"
href="tel:126933%20%20%20%204.2107"
value="+12693342107" target="_blank">126933
4.2107</a> <a moz-do-not-send="true"
href="http://libm-2.13.so/" target="_blank">libm-2.13.so</a>
__ieee754_exp2<br>
122614 4.0675 studio++ fcopy2<br>
<br>
For g++ the top contributors were these:<br>
<br>
samples % image name symbol name<br>
466893 21.3064 studio++ gcopy2<br>
300240 13.7013 studio++ cgi...<br>
176191 8.0404 studio++ dcopy2<br>
132491 6.0462 studio++ cgi...<br>
129580 5.9133 <a moz-do-not-send="true"
href="http://libm-2.13.so/" target="_blank">libm-2.13.so</a>
__ieee754_pow<br>
<a moz-do-not-send="true"
href="tel:126938%20%20%20%205.7928"
value="+12693857928" target="_blank">126938
5.7928</a> studio++ ecopy2<br>
119610 5.4583 studio++ fcopy2<br>
<br>
The libm floating-point routines 'fe...' only show
up with Clang++, so I suspect they account for the
slower performance.<br>
<br>
We are not purposely changing the floating-point
precision or rounding mode, so I am looking for a
way to avoid code that uses these functions
unnecessarily.<br>
<br>
We are compiling with these options:<br>
<br>
-march=core2 -msse4.1 -m64 -std=c++0x -fPIC
-pthread -gcc-toolchain /opt/gcc-4.7.2
-Wno-logical-op-parentheses
-Wno-shift-op-parentheses -O2<span class=""><font
color="#888888"><br>
<br>
</font></span></blockquote>
<div><br>
</div>
<div>There isn't any obvious reason why feholdexcept
etc. would be called from clang-compiled code, but
not gcc-compiled code; clang never generates calls
to it implicitly.</div>
<div><br>
</div>
<div>Can you hop into a debugger and get a stack
trace from a call to feholdexcept?</div>
</div>
</div>
</div>
</blockquote>
<br>
</div>
<div>
<div>Usually the reason these symbols show up on linux is
that you’re hitting the errno-versions of the libm entry
points (i.e. GCC is likely generating calls to a different
set of more streamlined libm entry points, while clang is
hitting the default versions).</div>
<div><br>
</div>
<br>
</div>
</blockquote>
</blockquote>
<br>
</blockquote>
Thanks for all the clues. Here is the stack trace:<br>
<pre> feholdexcept,
__ieee754_exp2,
exp2,
_ZN9cgi...
</pre>
Based on your various hints, I'm guessing that our code 'pow (2.0,
x)' is being optimized by Clang++ to 'exp2 (x)' and not by G++. We
will try using exp2 explicitly and see what happens with the G++
version.<br>
<br>
Perhaps we are running into a floating-point standards issue that
our old version of G++ is ignoring.<br>
<br>
We'll continue investigating tomorrow.<br>
</body>
</html>