[cfe-dev] Performance disparity between clang/LLVM and GCC when using libjpeg-turbo

Thu May 16 09:00:20 PDT 2013

Can you file a bugzilla report on llvm.org so we can look at it?

Thanks,

Evan

On May 15, 2013, at 10:32 PM, DRC <dcommander at users.sourceforge.net> wrote:

> Hi.  I maintain libjpeg-turbo, a heavily-accelerated fork of libjpeg for x86/x86-64 and ARM systems.  A large part of our speedup comes from assembly code, but our Huffman codec relies heavily on C compiler optimizations to achieve peak performance.  After upgrading to OS X 10.8, which uses Clang/LLVM as the default compiler rather than GCC, I observed a slowdown of 15-20% when compressing images using libjpeg-turbo, and it seems to be due to the compiler having trouble optimizing said Huffman codec.  I'll walk you through the steps to reproduce the issue:
> 
> NOTE:  this is probably reproducible on other platforms, such as Linux, as well.  I haven't tested it.
> 
> Prerequisites:
> -- Xcode 4.5.x installed under /Applications/Xcode.app
> -- nasm, automake, autoconf, and apple-gcc42 from MacPorts installed under /opt/local
> -- artificial.ppm from http://www.imagecompression.info/test_images/rgb8bit.zip
> 
> xcrun svn co svn://svn.code.sf.net/p/libjpeg-turbo/code/trunk libjpeg-turbo
> cd libjpeg-turbo
> /opt/local/bin/autoreconf -fiv
> 
> mkdir osx.64.clang
> cd osx.64.clang
> sh ../configure --host x86_64-apple-darwin NASM=/opt/local/bin/nasm CC='xcrun clang' CFLAGS=-O4
> ./tjbench {path_to}/artificial.ppm 95 -rgb -quiet
> 
> mkdir osx.64.llvmgcc
> cd osx.64.llvmgcc
> sh ../configure --host x86_64-apple-darwin NASM=/opt/local/bin/nasm CC='xcrun gcc' CFLAGS=-O3
> ./tjbench {path_to}/artificial.ppm 95 -rgb -quiet
> 
> mkdir osx.64.gcc42
> cd osx.64.gcc42
> sh ../configure --host x86_64-apple-darwin NASM=/opt/local/bin/nasm CC=/opt/local/bin/gcc-apple-4.2 CFLAGS=-O3
> ./tjbench {path_to}/artificial.ppm 95 -rgb -quiet
> 
> mkdir osx.32.clang
> cd osx.32.clang
> sh ../configure --host i686-apple-darwin NASM=/opt/local/bin/nasm CC='xcrun clang' CFLAGS='-m32 -O4' LDFLAGS=-m32
> ./tjbench {path_to}/artificial.ppm 95 -rgb -quiet
> 
> mkdir osx.32.llvmgcc
> cd osx.32.llvmgcc
> sh ../configure --host i686-apple-darwin NASM=/opt/local/bin/nasm CC='xcrun gcc' CFLAGS='-m32 -O3' LDFLAGS=-m32
> ./tjbench {path_to}/artificial.ppm 95 -rgb -quiet
> 
> mkdir osx.32.gcc42
> cd osx.32.gcc42
> sh ../configure --host i686-apple-darwin NASM=/opt/local/bin/nasm CC=/opt/local/bin/gcc-apple-4.2 CFLAGS='-O3 -m32' LDFLAGS=-m32
> ./tjbench {path_to}/artificial.ppm 95 -rgb -quiet
> 
> A spreadsheet of my results is attached.  Note that decompression performance is generally better across the board with Clang/LLVM, but compression performance is generally worse.  Note also that, when using the GCC front end to LLVM, the performance is somewhere in the middle, so it seems that part of the issue may be in Clang and part of it may be in LLVM.
> 
> If there are things I can do within the inner loops of jchuff.c to make it perform better under Clang/LLVM, I am definitely open to that.
> 
> DRC
> <libjpegturbo-1.3.ods>_______________________________________________
> cfe-dev mailing list
> cfe-dev at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev