<html>
    <head>
      <base href="http://llvm.org/bugs/" />
    </head>
    <body><table border="1" cellspacing="0" cellpadding="8">
        <tr>
          <th>Bug ID</th>
          <td><a class="bz_bug_link 
          bz_status_NEW "
   title="NEW --- - Performance disparity between clang/LLVM and GCC when using libjpeg-turbo"
   href="http://llvm.org/bugs/show_bug.cgi?id=16035">16035</a>
          </td>
        </tr>

        <tr>
          <th>Summary</th>
          <td>Performance disparity between clang/LLVM and GCC when using libjpeg-turbo
          </td>
        </tr>

        <tr>
          <th>Product</th>
          <td>clang
          </td>
        </tr>

        <tr>
          <th>Version</th>
          <td>3.2
          </td>
        </tr>

        <tr>
          <th>Hardware</th>
          <td>Macintosh
          </td>
        </tr>

        <tr>
          <th>OS</th>
          <td>MacOS X
          </td>
        </tr>

        <tr>
          <th>Status</th>
          <td>NEW
          </td>
        </tr>

        <tr>
          <th>Severity</th>
          <td>normal
          </td>
        </tr>

        <tr>
          <th>Priority</th>
          <td>P
          </td>
        </tr>

        <tr>
          <th>Component</th>
          <td>-New Bugs
          </td>
        </tr>

        <tr>
          <th>Assignee</th>
          <td>unassignedclangbugs@nondot.org
          </td>
        </tr>

        <tr>
          <th>Reporter</th>
          <td>dcommander@users.sourceforge.net
          </td>
        </tr>

        <tr>
          <th>CC</th>
          <td>llvmbugs@cs.uiuc.edu
          </td>
        </tr>

        <tr>
          <th>Classification</th>
          <td>Unclassified
          </td>
        </tr></table>
      <p>
        <div>
        <pre>Created <span class=""><a href="attachment.cgi?id=10526" name="attach_10526" title="libjpeg-turbo performance results, Clang/LLVM vs. GCC, OS X 10.8">attachment 10526</a> <a href="attachment.cgi?id=10526&action=edit" title="libjpeg-turbo performance results, Clang/LLVM vs. GCC, OS X 10.8">[details]</a></span>
libjpeg-turbo performance results, Clang/LLVM vs. GCC, OS X 10.8

I maintain libjpeg-turbo, a heavily-accelerated fork of libjpeg for x86/x86-64
and ARM systems.  A large part of our speedup comes from assembly code, but our
Huffman codec relies heavily on C compiler optimizations to achieve peak
performance.  After upgrading to OS X 10.8, which uses Clang/LLVM as the
default compiler rather than GCC, I observed a slowdown of 15-20% when
compressing images using libjpeg-turbo, and it seems to be due to the compiler
having trouble optimizing said Huffman codec (jchuff.c in the libjpeg-turbo
source.)  I'll walk you through the steps to reproduce the issue:

NOTE:  this is probably reproducible on other platforms, such as Linux, as
well.  I haven't tested it.

Prerequisites:
-- Xcode 4.5.x installed under /Applications/Xcode.app
-- nasm, automake, autoconf, and apple-gcc42 from MacPorts installed under
/opt/local
-- artificial.ppm from <a href="http://www.imagecompression.info/test_images/rgb8bit.zip">http://www.imagecompression.info/test_images/rgb8bit.zip</a>

xcrun svn co svn://svn.code.sf.net/p/libjpeg-turbo/code/trunk libjpeg-turbo
cd libjpeg-turbo
/opt/local/bin/autoreconf -fiv

mkdir osx.64.clang
cd osx.64.clang
sh ../configure --host x86_64-apple-darwin NASM=/opt/local/bin/nasm CC='xcrun
clang' CFLAGS=-O4
./tjbench {path_to}/artificial.ppm 95 -rgb -quiet

mkdir osx.64.llvmgcc
cd osx.64.llvmgcc
sh ../configure --host x86_64-apple-darwin NASM=/opt/local/bin/nasm CC='xcrun
gcc' CFLAGS=-O3
./tjbench {path_to}/artificial.ppm 95 -rgb -quiet

mkdir osx.64.gcc42
cd osx.64.gcc42
sh ../configure --host x86_64-apple-darwin NASM=/opt/local/bin/nasm
CC=/opt/local/bin/gcc-apple-4.2 CFLAGS=-O3
./tjbench {path_to}/artificial.ppm 95 -rgb -quiet

mkdir osx.32.clang
cd osx.32.clang
sh ../configure --host i686-apple-darwin NASM=/opt/local/bin/nasm CC='xcrun
clang' CFLAGS='-m32 -O4' LDFLAGS=-m32
./tjbench {path_to}/artificial.ppm 95 -rgb -quiet

mkdir osx.32.llvmgcc
cd osx.32.llvmgcc
sh ../configure --host i686-apple-darwin NASM=/opt/local/bin/nasm CC='xcrun
gcc' CFLAGS='-m32 -O3' LDFLAGS=-m32
./tjbench {path_to}/artificial.ppm 95 -rgb -quiet

mkdir osx.32.gcc42
cd osx.32.gcc42
sh ../configure --host i686-apple-darwin NASM=/opt/local/bin/nasm
CC=/opt/local/bin/gcc-apple-4.2 CFLAGS='-O3 -m32' LDFLAGS=-m32
./tjbench {path_to}/artificial.ppm 95 -rgb -quiet

A spreadsheet of my results and the test image is attached.  Note that
decompression performance is generally better across the board with Clang/LLVM,
but compression performance is generally worse.  Note also that, when using the
GCC front end to LLVM, the performance is somewhere in the middle, so it seems
that part of the issue may be in Clang and part of it may be in LLVM.

If there are things I can do within the inner loops of jchuff.c to make it
perform better under Clang/LLVM, I am definitely open to that.</pre>
        </div>
      </p>
      <hr>
      <span>You are receiving this mail because:</span>
      
      <ul>
          <li>You are on the CC list for the bug.</li>
      </ul>
    </body>
</html>