[LLVMbugs] [Bug 21760] New: GCC4.9.2 optimizer produces 20% faster code than clang clang-600.0.56 (Xcode 6.1.1)

bugzilla-daemon at llvm.org bugzilla-daemon at llvm.org
Fri Dec 5 08:00:27 PST 2014


http://llvm.org/bugs/show_bug.cgi?id=21760

            Bug ID: 21760
           Summary: GCC4.9.2 optimizer produces 20% faster code than clang
                    clang-600.0.56 (Xcode 6.1.1)
           Product: clang
           Version: 3.5
          Hardware: Macintosh
                OS: MacOS X
            Status: NEW
          Severity: normal
          Priority: P
         Component: LLVM Codegen
          Assignee: unassignedclangbugs at nondot.org
          Reporter: david.hoerl at gmail.com
                CC: llvmbugs at cs.uiuc.edu
    Classification: Unclassified

Created attachment 13439
  --> http://llvm.org/bugs/attachment.cgi?id=13439&action=edit
Archive with source and two compiled apps

As an Apple developer, I got sucked into assisting the libjpeg-turbo project
with a few issues, and then involved myself in their bug 16035. As I told DRC,
just pointing out that the overall performance was slower wasn't going to prod
any particular action by llvm, since there was no real specifics offered.
Because I sort of borked up 16035, I thought it wise to open up a new bug with
specific information and a small demo file that anyone there can use to test
with.

DRC is building his gcc code with version 4.2, and I tested with that binary.
Using -Ofast with the latest clang, the performance of the Huffman coder was as
stated - about 20% slower. The structure of the problematic code is in a
Huffman encoding section, where currently there are 63 replicated blocks of
code, as macros, each with a different "seed" integer. It also appears that for
a "common" jpeg image, foo == 0 is true eight times more than not, so it really
helps if the code efficiently handles the 'equals 0' case.

The current macro is structured as:

int x;
int foo;

if((foo = table[x])) {
  r++;
} else {
  // a bunch of code that acts on foo
}

What clang does is product assembler that mimics each block above. GCC, on the
other hand, creates an intermediate jump table, and produces assembler with the
first test inline followed by an optional jump:

if(!(foo = table[x])) {
  jump A;
}
r++;
L_someLabel_A: // for A to return to

if(!(foo = table[x])) {
  jump B;
}
r++;
L_someLabel_B: // for B to return to

Great! So I modified the macros to do the same (more or less) in C, using the
address of label C extension, so that I only needed one block of code to jump
to (the common block acts on "foo"). I compiled my modified app, and now clang
was executing the benchmark test with the same performance [new clang with new
code == gcc 4.2 with old code]. 

Then I wondered, how would a new version of gcc do with the old code or new
code? I used Homebrew to install gcc 4.9 (4.2 isn't even supported!), and
retested (building with -O3). Ugh! Using my modified code, gcc is again 20%
fast - perhaps even more [new code, new clang < new gcc].

Looking at the assembler (using the fantastic Hopper tool), you can see that
gcc has really squeezed that first test down:

GCC:

0000000100000aa9         movsx      r9d, word [ds:rsi+0x30]                    
; XREF=_encode_one_blockX2+870
0000000100000aae         lea        r13, qword [ds:0x100000ac1]
0000000100000ab5         test       r9d, r9d
0000000100000ab8         jne        0x1000007c0 ; NOTE JUMP ONLY TAKEN IF THE
VARIABLE != 0
0000000100000abe         add        ebp, 0x1


CLANG:

00000001000007de         mov        rcx, qword [ss:rbp+var_38]                 
; XREF=_encode_one_blockX2+143
00000001000007e2         mov        cx, word [ds:rcx+0x4]
00000001000007e6         test       cx, cx
00000001000007e9         je         0x1000007f7 ; JUMP IS USUALLY TAKEN
00000001000007eb         lea        r11, qword [ds:0x1000007fa]
00000001000007f2         jmp        0x100000cef
00000001000007f7         inc        r15d                                       
; XREF=_encode_one_blockX2+169


I've attached the following in an archive:

main.c (so you can build an real binary)
testMacro.c (cc -O3 main.c testMacro.c -o ...)
clang.out (set 'cc' to the latest clang in Xcode)
gcc.out (set 'cc' to gcc4-9)

-- 
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-bugs/attachments/20141205/ad2b62f2/attachment.html>


More information about the llvm-bugs mailing list