<html>
    <head>
      <base href="https://llvm.org/bugs/" />
    </head>
    <body><table border="1" cellspacing="0" cellpadding="8">
        <tr>
          <th>Bug ID</th>
          <td><a class="bz_bug_link 
          bz_status_NEW "
   title="NEW --- - BranchProbabilities::scale is very hot function but it's assembly is very inefficient."
   href="https://llvm.org/bugs/show_bug.cgi?id=24620">24620</a>
          </td>
        </tr>

        <tr>
          <th>Summary</th>
          <td>BranchProbabilities::scale is very hot function but it's assembly is very inefficient.
          </td>
        </tr>

        <tr>
          <th>Product</th>
          <td>libraries
          </td>
        </tr>

        <tr>
          <th>Version</th>
          <td>trunk
          </td>
        </tr>

        <tr>
          <th>Hardware</th>
          <td>HP
          </td>
        </tr>

        <tr>
          <th>OS</th>
          <td>Linux
          </td>
        </tr>

        <tr>
          <th>Status</th>
          <td>NEW
          </td>
        </tr>

        <tr>
          <th>Severity</th>
          <td>normal
          </td>
        </tr>

        <tr>
          <th>Priority</th>
          <td>P
          </td>
        </tr>

        <tr>
          <th>Component</th>
          <td>Support Libraries
          </td>
        </tr>

        <tr>
          <th>Assignee</th>
          <td>unassignedbugs@nondot.org
          </td>
        </tr>

        <tr>
          <th>Reporter</th>
          <td>cmtice@google.com
          </td>
        </tr>

        <tr>
          <th>CC</th>
          <td>llvm-bugs@lists.llvm.org
          </td>
        </tr>

        <tr>
          <th>Classification</th>
          <td>Unclassified
          </td>
        </tr></table>
      <p>
        <div>
        <pre>Created <span class=""><a href="attachment.cgi?id=14791" name="attach_14791" title="gzip'd .ii file">attachment 14791</a> <a href="attachment.cgi?id=14791&action=edit" title="gzip'd .ii file">[details]</a></span>
gzip'd .ii file

While recently examining a performance problem in clang (8x slower than GCC,
see <a class="bz_bug_link 
          bz_status_NEW "
   title="NEW --- - Compilation wtih Clang is 8x slower than GCC, mostly due to Greedy Register Allocator"
   href="show_bug.cgi?id=24618">https://llvm.org/bugs/show_bug.cgi?id=24618</a>), we looked at the results of
running 'perf' on clang and saw that in this case the hottest function was
llvm::BranchProbabilities::scale (20.69% of the entire compilation was being
spent in this function).

Looking more closely at the function's assembly, annotated with perf results we
saw:

  0.08 │      xor    %edx,%edx
  0.15 │      imul   %rax,%rdi
  2.51 │      shr    $0x20,%rcx
  0.00 │      imul   %rax,%rcx
  0.93 │      mov    %rdi,%rsi
  0.45 │      mov    %rcx,%rax
  0.86 │      shr    $0x20,%rsi
  0.69 │      shr    $0x20,%rax
  1.01 │      add    %esi,%ecx
  0.41 │      mov    $0xffffffffffffffff,%rsi
  0.26 │      setb   %dl
  0.55 │      add    %edx,%eax
  0.85 │      cmp    %eax,%r8d
       │    ↓ ja     50
       │49:   mov    %rsi,%rax
  1.33 │    ← retq
       │      nop
  0.93 │50:   shl    $0x20,%rax
  0.33 │      mov    %ecx,%ecx
       │      xor    %edx,%edx
  0.05 │      or     %rcx,%rax
  1.00 │      mov    $0xffffffff,%r9d
  0.27 │      div    %r8
 32.45 │      cmp    %r9,%rax
  1.14 │      mov    %rax,%rcx
  0.74 │    ↑ ja     49
  0.98 │      mov    %rdx,%rax
  0.08 │      mov    %edi,%edi
  0.03 │      xor    %edx,%edx
  0.40 │      shl    $0x20,%rax
  0.94 │      shl    $0x20,%rcx
  0.03 │      or     %rdi,%rax
  0.50 │      div    %r8
 43.53 │      add    %rcx,%rax
  1.25 │      cmovae %rax,%rsi
  2.61 │    ↑ jmp    49


It appears that nearly 75% of the time in this function is being spent on the
two 'div' ops. This assembly is very inefficient.. the two div's ought to be
done together, thus possibly halving the time spent in this function.

(This is on intel x86_64, BTW, in case it's not obvious from the assembly).

This is with ToT Clang/LLVM, but with:

$ cmake -G "Unix Makefiles" -DCMAKE_INSTALL_PREFIX=/tmp/llvm-install.opt
-DCMAKE_BUILD_TYPE=Release -DLLVM_ENABLE_ASSERTIONS=On  <path-to-llvm>
$ make all
$ make install

Attached is a gzip'd version of the .ii file we used.  The clang command to
compile this file is:

/usr/local/google2/cmtice/llvm-work/llvm-install.opt/bin/clang++  -c   
-fno-exceptions -Wno-multichar -m64 -Wa,--noexecstack -fPIC
-no-canonical-prefixes  -U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=2 -fstack-protector
-D__STDC_FORMAT_MACROS -D__STDC_CONSTANT_MACROS -DANDROID -fmessage-length=0 -W
-Wall -Wno-unused     -Winit-self -Wpointer-arith -g -fno-strict-aliasing
-DNDEBUG -UDEBUG           -D__compiler_offsetof=__builtin_offsetof
-Werror=int-conversion -Wno-reserved-id-macro -Wno-format-pedantic
-Wno-unused-command-line-argument   -target x86_64-linux-gnu   -DANDROID
-fmessage-length=0 -W -Wall -Wno-unused -Winit-self -Wpointer-arith
-Wsign-promo -DNDEBUG -UDEBUG  -Wno-inconsistent-missing-override   -target
x86_64-linux-gnu  -DBUILDING_LIBART=1 -Wthread-safety -Wthread-safety-negative
-Wimplicit-fallthrough -Wfloat-equal -Wint-to-void-pointer-cast
-Wused-but-marked-unused -Wdeprecated -Wunreachable-code-break
-Wunreachable-code-return -Wmissing-noreturn -fno-omit-frame-pointer -fno-rtti
-std=gnu++11 -ggdb3 -Wall -Werror -Wextra -Wstrict-aliasing -fstrict-aliasing
-Wunreachable-code -Wredundant-decls -Wshadow -Wunused -fvisibility=protected
-DART_DEFAULT_GC_TYPE_IS_CMS -DIMT_SIZE=64 -DART_BASE_ADDRESS=0x60000000
-DART_DEFAULT_INSTRUCTION_SET_FEATURES=default
-DART_BASE_ADDRESS_MIN_DELTA=-0x1000000 -DART_BASE_ADDRESS_MAX_DELTA=0x1000000
-DART_DEFAULT_INSTRUCTION_SET_FEATURES="default" -O3 -Wframe-larger-than=2700
-fPIC -D_USING_LIBCXX -std=gnu++14 -nostdinc++  -Werror=int-to-pointer-cast
-Werror=pointer-to-int-cast  -Werror=address-of-temporary
-Werror=null-dereference -Werror=return-type -o interpreter_goto_table_impl.o
./interpreter_goto_table_impl.ii</pre>
        </div>
      </p>
      <hr>
      <span>You are receiving this mail because:</span>
      
      <ul>
          <li>You are on the CC list for the bug.</li>
      </ul>
    </body>
</html>