<html>
  <head>
    <meta content="text/html; charset=ISO-8859-1"
      http-equiv="Content-Type">
  </head>
  <body text="#000000" bgcolor="#FFFFFF">
    <div class="moz-cite-prefix">Same applies to exp2f, btw, since they
      have fairly very similar implementation.<br>
      <br>
       - &frac12;<br>
      <br>
      On 13-09-05 03:55 PM, Halfdan Ingvarsson wrote:<br>
    </div>
    <blockquote cite="mid:5228E1A4.9040300@sidefx.com" type="cite">
      <meta content="text/html; charset=ISO-8859-1"
        http-equiv="Content-Type">
      <div class="moz-cite-prefix">glibc's expf() function changes the
        FP rounding mode on every call -- which are the fe* calls you're
        seeing -- resulting in a dreadful performance (IIRC there's a
        pipeline stall when rounding mode changes).<br>
        <br>
        Have a look at sysdeps/ieee754/flt-32/e_expf.c in the glibc
        sources to verify. This is true as of glibc 2.14, at least.<br>
        <br>
        We had to roll our own to work around it.<br>
        <br>
         - &frac12;<br>
        <br>
        On 13-09-05 03:33 PM, Stephen Canon wrote:<br>
      </div>
      <blockquote
        cite="mid:894741D6-06A5-473D-883F-083548EAED9D@apple.com"
        type="cite">
        <meta http-equiv="Content-Type" content="text/html;
          charset=ISO-8859-1">
        <div>On Sep 5, 2013, at 12:20 PM, Eli Friedman <<a
            moz-do-not-send="true" href="mailto:eli.friedman@gmail.com">eli.friedman@gmail.com</a>>

          wrote:</div>
        <div><br class="Apple-interchange-newline">
          <blockquote type="cite">
            <div dir="ltr">On Thu, Sep 5, 2013 at 12:15 PM, Richard
              Hadsell <span dir="ltr"><<a moz-do-not-send="true"
                  href="mailto:hadsell@blueskystudios.com"
                  target="_blank">hadsell@blueskystudios.com</a>></span>
              wrote:<br>
              <div class="gmail_extra">
                <div class="gmail_quote">
                  <blockquote class="gmail_quote" style="margin:0px 0px
                    0px
0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">We

                    have been comparing the performance of code
                    generated by Clang++ 3.3 with G++ 4.5.1.  The
                    results have been mixed.<br>
                    <br>
                    We ran a profiler to look for what could cause some
                    cases to run slower with Clang++ and found that some
                    floating-point routines were taking a lot of time:<br>
                    <br>
                    samples  %        image name     symbol name<br>
                    596677   19.7935  studio++       gcopy2<br>
                    274870    9.1182  <a moz-do-not-send="true"
                      href="http://libm-2.13.so/" target="_blank">libm-2.13.so</a>
                      feholdexcept<br>
                    262358    8.7032  <a moz-do-not-send="true"
                      href="http://libm-2.13.so/" target="_blank">libm-2.13.so</a>
                      fesetenv<br>
                    258225    8.5661  studio++       cgi...<br>
                    207915    6.8971  <a moz-do-not-send="true"
                      href="http://libm-2.13.so/" target="_blank">libm-2.13.so</a>
                      fesetround<br>
                    193316    6.4129  studio++       dcopy2<br>
                    <a moz-do-not-send="true"
                      href="tel:126933%20%20%20%204.2107"
                      value="+12693342107" target="_blank">126933 4.2107</a> 
                    <a moz-do-not-send="true"
                      href="http://libm-2.13.so/" target="_blank">libm-2.13.so</a>
                      __ieee754_exp2<br>
                    122614    4.0675  studio++       fcopy2<br>
                    <br>
                    For g++ the top contributors were these:<br>
                    <br>
                    samples  %        image name     symbol name<br>
                    466893   21.3064  studio++       gcopy2<br>
                    300240   13.7013  studio++       cgi...<br>
                    176191    8.0404  studio++       dcopy2<br>
                    132491    6.0462  studio++       cgi...<br>
                    129580    5.9133  <a moz-do-not-send="true"
                      href="http://libm-2.13.so/" target="_blank">libm-2.13.so</a>
                      __ieee754_pow<br>
                    <a moz-do-not-send="true"
                      href="tel:126938%20%20%20%205.7928"
                      value="+12693857928" target="_blank">126938 5.7928</a> 
                    studio++       ecopy2<br>
                    119610    5.4583  studio++       fcopy2<br>
                    <br>
                    The libm floating-point routines 'fe...' only show
                    up with Clang++, so I suspect they account for the
                    slower performance.<br>
                    <br>
                    We are not purposely changing the floating-point
                    precision or rounding mode, so I am looking for a
                    way to avoid code that uses these functions
                    unnecessarily.<br>
                    <br>
                    We are compiling with these options:<br>
                    <br>
                    -march=core2 -msse4.1 -m64 -std=c++0x -fPIC -pthread
                    -gcc-toolchain /opt/gcc-4.7.2
                    -Wno-logical-op-parentheses
                    -Wno-shift-op-parentheses -O2<span class=""><font
                        color="#888888"><br>
                        <br>
                      </font></span></blockquote>
                  <div><br>
                  </div>
                  <div>There isn't any obvious reason why feholdexcept
                    etc. would be called from clang-compiled code, but
                    not gcc-compiled code; clang never generates calls
                    to it implicitly.</div>
                  <div><br>
                  </div>
                  <div>Can you hop into a debugger and get a stack trace
                    from a call to feholdexcept?</div>
                </div>
              </div>
            </div>
          </blockquote>
          <br>
        </div>
        <div>
          <div>Usually the reason these symbols show up on linux is that
            you’re hitting the errno-versions of the libm entry points
            (i.e. GCC is likely generating calls to a different set of
            more streamlined libm entry points, while clang is hitting
            the default versions).</div>
          <div><br>
          </div>
          <br>
        </div>
      </blockquote>
    </blockquote>
    <br>
  </body>
</html>