[LLVMbugs] [Bug 21760] New: GCC4.9.2 optimizer produces 20% faster code than clang clang-600.0.56 (Xcode 6.1.1)
bugzilla-daemon at llvm.org
bugzilla-daemon at llvm.org
Fri Dec 5 08:00:27 PST 2014
http://llvm.org/bugs/show_bug.cgi?id=21760
Bug ID: 21760
Summary: GCC4.9.2 optimizer produces 20% faster code than clang
clang-600.0.56 (Xcode 6.1.1)
Product: clang
Version: 3.5
Hardware: Macintosh
OS: MacOS X
Status: NEW
Severity: normal
Priority: P
Component: LLVM Codegen
Assignee: unassignedclangbugs at nondot.org
Reporter: david.hoerl at gmail.com
CC: llvmbugs at cs.uiuc.edu
Classification: Unclassified
Created attachment 13439
--> http://llvm.org/bugs/attachment.cgi?id=13439&action=edit
Archive with source and two compiled apps
As an Apple developer, I got sucked into assisting the libjpeg-turbo project
with a few issues, and then involved myself in their bug 16035. As I told DRC,
just pointing out that the overall performance was slower wasn't going to prod
any particular action by llvm, since there was no real specifics offered.
Because I sort of borked up 16035, I thought it wise to open up a new bug with
specific information and a small demo file that anyone there can use to test
with.
DRC is building his gcc code with version 4.2, and I tested with that binary.
Using -Ofast with the latest clang, the performance of the Huffman coder was as
stated - about 20% slower. The structure of the problematic code is in a
Huffman encoding section, where currently there are 63 replicated blocks of
code, as macros, each with a different "seed" integer. It also appears that for
a "common" jpeg image, foo == 0 is true eight times more than not, so it really
helps if the code efficiently handles the 'equals 0' case.
The current macro is structured as:
int x;
int foo;
if((foo = table[x])) {
r++;
} else {
// a bunch of code that acts on foo
}
What clang does is product assembler that mimics each block above. GCC, on the
other hand, creates an intermediate jump table, and produces assembler with the
first test inline followed by an optional jump:
if(!(foo = table[x])) {
jump A;
}
r++;
L_someLabel_A: // for A to return to
if(!(foo = table[x])) {
jump B;
}
r++;
L_someLabel_B: // for B to return to
Great! So I modified the macros to do the same (more or less) in C, using the
address of label C extension, so that I only needed one block of code to jump
to (the common block acts on "foo"). I compiled my modified app, and now clang
was executing the benchmark test with the same performance [new clang with new
code == gcc 4.2 with old code].
Then I wondered, how would a new version of gcc do with the old code or new
code? I used Homebrew to install gcc 4.9 (4.2 isn't even supported!), and
retested (building with -O3). Ugh! Using my modified code, gcc is again 20%
fast - perhaps even more [new code, new clang < new gcc].
Looking at the assembler (using the fantastic Hopper tool), you can see that
gcc has really squeezed that first test down:
GCC:
0000000100000aa9 movsx r9d, word [ds:rsi+0x30]
; XREF=_encode_one_blockX2+870
0000000100000aae lea r13, qword [ds:0x100000ac1]
0000000100000ab5 test r9d, r9d
0000000100000ab8 jne 0x1000007c0 ; NOTE JUMP ONLY TAKEN IF THE
VARIABLE != 0
0000000100000abe add ebp, 0x1
CLANG:
00000001000007de mov rcx, qword [ss:rbp+var_38]
; XREF=_encode_one_blockX2+143
00000001000007e2 mov cx, word [ds:rcx+0x4]
00000001000007e6 test cx, cx
00000001000007e9 je 0x1000007f7 ; JUMP IS USUALLY TAKEN
00000001000007eb lea r11, qword [ds:0x1000007fa]
00000001000007f2 jmp 0x100000cef
00000001000007f7 inc r15d
; XREF=_encode_one_blockX2+169
I've attached the following in an archive:
main.c (so you can build an real binary)
testMacro.c (cc -O3 main.c testMacro.c -o ...)
clang.out (set 'cc' to the latest clang in Xcode)
gcc.out (set 'cc' to gcc4-9)
--
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-bugs/attachments/20141205/ad2b62f2/attachment.html>
More information about the llvm-bugs
mailing list