<html>
    <head>
      <base href="https://llvm.org/bugs/" />
    </head>
    <body><span class="vcard"><a class="email" href="mailto:james.molloy@arm.com" title="James Molloy <james.molloy@arm.com>"> <span class="fn">James Molloy</span></a>
</span> changed
              <a class="bz_bug_link 
          bz_status_RESOLVED  bz_closed"
   title="RESOLVED INVALID - ARM code runs 2x slower compared to gcc"
   href="https://llvm.org/bugs/show_bug.cgi?id=26450">bug 26450</a>
        <br>
             <table border="1" cellspacing="0" cellpadding="8">
          <tr>
            <th>What</th>
            <th>Removed</th>
            <th>Added</th>
          </tr>

         <tr>
           <td style="text-align:right;">Status</td>
           <td>NEW
           </td>
           <td>RESOLVED
           </td>
         </tr>

         <tr>
           <td style="text-align:right;">Resolution</td>
           <td>---
           </td>
           <td>INVALID
           </td>
         </tr></table>
      <p>
        <div>
            <b><a class="bz_bug_link 
          bz_status_RESOLVED  bz_closed"
   title="RESOLVED INVALID - ARM code runs 2x slower compared to gcc"
   href="https://llvm.org/bugs/show_bug.cgi?id=26450#c5">Comment # 5</a>
              on <a class="bz_bug_link 
          bz_status_RESOLVED  bz_closed"
   title="RESOLVED INVALID - ARM code runs 2x slower compared to gcc"
   href="https://llvm.org/bugs/show_bug.cgi?id=26450">bug 26450</a>
              from <span class="vcard"><a class="email" href="mailto:james.molloy@arm.com" title="James Molloy <james.molloy@arm.com>"> <span class="fn">James Molloy</span></a>
</span></b>
        <pre>Hi,

OK, there's two things here:

Firstly, it seems __umodsi3 and friends are significantly slower than
__aeabi_idivmod. GCC is generating __aeabi_idivmod - perhaps we should? We
select __modsi3 unless the target is EABI or Android - I suspect that should be
EABI, Android or GNUEABI.

GCC 4.9:        1.24s
Clang 3.7:      3.48s
Clang 3.7 (using __aeabi_idivmod): 1.15s

Secondly, you're not specifying a CPU. That's why your division is going out to
the library. Unless you're on a Cortex-A9, you'll have hardware division. Use
-mcpu to enable it.

GCC 4.9 with -mcpu=cortex-a15: 276ms
Clang 3.7 with -mcpu=cortex-a15: 258ms

(I had to switch to using perf stat's task-clock metric because time elapsed
was getting too noisy)

By the way: "I've just discovered how immature LLVM/Clang was on ARM." (from
<a href="https://users.rust-lang.org/t/executable-size-and-performance-vs-c/4496/34">https://users.rust-lang.org/t/executable-size-and-performance-vs-c/4496/34</a>)

That's a little over the top - the ARM backend is around 10 years old now, it's
fairly mature.

James</pre>
        </div>
      </p>
      <hr>
      <span>You are receiving this mail because:</span>
      
      <ul>
          <li>You are on the CC list for the bug.</li>
      </ul>
    </body>
</html>