[LLVMdev] trunk's optimizer generates slower code than 3.5
Jack Howarth
howarth.mailing.lists at gmail.com
Sat Feb 14 09:18:24 PST 2015
The same 22% performance regression also exists in current llvm/clang
trunk for the SciMark2 Sparse matmult benchmark.
On Sat, Feb 14, 2015 at 12:11 PM, Jack Howarth
<howarth.mailing.lists at gmail.com> wrote:
> Using the SciMark 2.0 code from
> http://math.nist.gov/scimark2/scimark2_1c.zip compiled with the
> same...
>
> make CFLAGS="-O3 -march=native"
>
> I am able to reproduce the 22% performance regression in the run time
> of the Sparse matmult benchmark.
> For 10 runs of the scimark2 benechmark, I get 998.439+/-0.4828 with
> the release llvm clang 3.5.1 compiler
> and 1217.363+/-1.1004 for the current clang 3.6svn from 3.6 branch. Not good.
> Jack
>
> On Sat, Feb 14, 2015 at 11:19 AM, Jack Howarth
> <howarth.mailing.lists at gmail.com> wrote:
>> Do any of the build-bots routinely run the SciMark v2.0 benchmark?
>> If so, might not an examination of those logs reveal the commit range
>> at which the optimizations in that benchmark degraded?
>> Jack
>>
>> On Sat, Feb 14, 2015 at 11:13 AM, Jack Howarth
>> <howarth.mailing.lists at gmail.com> wrote:
>>> The regressions in the performance of generated code, introduced
>>> by the llvm 3.6 release, don't seem to be limited to this 8 queens
>>> puzzle" solver test case. See...
>>>
>>> http://www.phoronix.com/scan.php?page=article&item=llvm-clang-3.5-3.6-rc1&num=1
>>>
>>> where a bit hit in the performance of the Sparse Matrix Multiply test
>>> of the SciMark v2.0 benchmark was observed as well as others.
>>> Do you really want to release 3.6 with this level of performance regression?
>>> Jack
>>>
>>> On Fri, Feb 13, 2015 at 2:47 PM, Jack Howarth
>>> <howarth.mailing.lists at gmail.com> wrote:
>>>> Also confirmed with the llvm 3.5.1 release and the llvm 3.6 release
>>>> branch on x86_64-apple-darwin14...
>>>>
>>>> % clang-3.5 -O3 -mssse3 -fomit-frame-pointer -fno-stack-protector
>>>> -fno-exceptions -o 8 8.c
>>>> % time ./8 9
>>>> 352 solutions
>>>> 3.603u 0.002s 0:03.60 100.0% 0+0k 0+0io 2pf+0w
>>>> % time ./8 10
>>>> 724 solutions
>>>> 104.217u 0.059s 1:44.30 99.9% 0+0k 0+0io 2pf+0w
>>>>
>>>> % clang-3.6 -O3 -mssse3 -fomit-frame-pointer -fno-stack-protector
>>>> -fno-exceptions -o 8 8.c
>>>> % time ./8 9
>>>> 352 solutions
>>>> 4.050u 0.001s 0:04.05 100.0% 0+0k 0+0io 2pf+0w
>>>> % time ./8 10
>>>> 724 solutions
>>>> 114.808u 0.041s 1:54.86 99.9% 0+0k 0+0io 2pf+0w
>>>>
>>>> On Fri, Feb 13, 2015 at 3:37 AM, 191919 <191919 at gmail.com> wrote:
>>>>> I submitted the problem report to clang's bugzilla but no one seems to
>>>>> care so I have to send it to the mailing list.
>>>>>
>>>>> clang 3.7 svn (trunk 229055 as the time I was to report this problem)
>>>>> generates slower code than 3.5 (Apple LLVM version 6.0
>>>>> (clang-600.0.56) (based on LLVM 3.5svn)) for the following code.
>>>>>
>>>>> It is a "8 queens puzzle" solver written as an educational example. As
>>>>> compiled by both clang 3.5 and 3.7, it gave the correct answer, but
>>>>> clang 3.5 generates code which runs 20% faster than 3.6/3.7.
>>>>>
>>>>> ##########################################
>>>>> # clang 3.5 which comes with Xcode 6.1.1
>>>>> ##########################################
>>>>> $ clang -O3 -mssse3 -fomit-frame-pointer -fno-stack-protector
>>>>> -fno-exceptions -o 8 8.c
>>>>> $ time ./8 9 # 9 queens
>>>>> 352 solutions
>>>>> $ time ./8 10 # 10 queens
>>>>> ./8 9 1.63s user 0.00s system 99% cpu 1.632 total
>>>>> 724 solutions
>>>>> ./8 10 45.11s user 0.01s system 99% cpu 45.121 total
>>>>>
>>>>> ##########################################
>>>>> # clang 3.7 svn trunk
>>>>> ##########################################
>>>>> $ /opt/bin/clang -O3 -mssse3 -fomit-frame-pointer -fno-stack-protector
>>>>> -fno-exceptions -o 8 8.c
>>>>> $ time ./8 9 # 9 queens
>>>>> 352 solutions
>>>>> ./8 9 2.07s user 0.00s system 99% cpu 2.078 total
>>>>> $ time ./8 10 # 10 queens
>>>>> 724 solutions
>>>>> ./8 10 56.63s user 0.02s system 99% cpu 56.650 total
>>>>>
>>>>> The source code is below, I also attached the executable files as well
>>>>> as the assembly code files for clang 3.5 and 3.6 by IDA.
>>>>>
>>>>> The performance is even worse when compiling as 32-bit code while
>>>>> gcc-4.9.2 is not affected.
>>>>>
>>>>> ########## clang-3.5
>>>>> $ clang -m32 -O3 -fomit-frame-pointer -fno-stack-protector
>>>>> -fno-exceptions -o 8 8.c
>>>>> $ time ./8 9
>>>>> 352 solutions
>>>>> ./8 9 1.95s user 0.00s system 99% cpu 1.950 total
>>>>>
>>>>> ########## clang-3.7
>>>>> $ /opt/bin/clang -m32 -O3 -fomit-frame-pointer -fno-stack-protector
>>>>> -fno-exceptions -o 8 8.c
>>>>> $ time ./8 9
>>>>> 352 solutions
>>>>> ./8 9 2.48s user 0.00s system 99% cpu 2.480 total
>>>>>
>>>>> ######### gcc-4.9.2
>>>>> $ /opt/bin/gcc -m32 -O3 -fomit-frame-pointer -fno-stack-protector
>>>>> -fno-exceptions -o 8 8.c
>>>>> $ time ./8 9
>>>>> 352 solutions
>>>>> ./8 9 1.44s user 0.00s system 99% cpu 1.442 total
>>>>>
>>>>>
>>>>> ```
>>>>> #include <stdio.h>
>>>>> #include <stdlib.h>
>>>>>
>>>>> static inline int validate(int* a, int d)
>>>>> {
>>>>> int i, j, x;
>>>>> for (i = 0; i < d; ++i)
>>>>> {
>>>>> for (j = i+1, x = 1; j < d; ++j, ++x)
>>>>> {
>>>>> const int d = a[i] - a[j];
>>>>> if (d == 0 || d == -x || d == x) return 0;
>>>>> }
>>>>> }
>>>>> return 1;
>>>>> }
>>>>>
>>>>> static inline int solve(int d)
>>>>> {
>>>>> int r = 0;
>>>>> int* a = (int*) calloc(sizeof(int), d+1);
>>>>> int p = d - 1;
>>>>>
>>>>> for (;;)
>>>>> {
>>>>> a[p]++;
>>>>>
>>>>> if (a[p] > d-1)
>>>>> {
>>>>> int bp = p - 1;
>>>>> while (bp >= 0)
>>>>> {
>>>>> a[bp]++;
>>>>> if (a[bp] <= d-1) break;
>>>>> a[bp] = 0;
>>>>> --bp;
>>>>> }
>>>>> if (bp < 0)
>>>>> break;
>>>>> a[p] = 0;
>>>>> }
>>>>> if (validate(a, d))
>>>>> {
>>>>> ++r;
>>>>> }
>>>>> }
>>>>>
>>>>> free(a);
>>>>> return r;
>>>>> }
>>>>>
>>>>> int main(int argc, char** argv)
>>>>> {
>>>>> if (argc != 2) return -1;
>>>>> int r = solve((int) strtol(argv[1], NULL, 10));
>>>>> printf("%d solutions\n", r);
>>>>> }
>>>>> ```
>>>>>
>>>>> clang 3.5's result:
>>>>>
>>>>> ```
>>>>> public _main
>>>>> _main proc near
>>>>>
>>>>> var_48 = qword ptr -48h
>>>>> var_40 = qword ptr -40h
>>>>> var_34 = dword ptr -34h
>>>>>
>>>>> push rbp
>>>>> push r15
>>>>> push r14
>>>>> push r13
>>>>> push r12
>>>>> push rbx
>>>>> sub rsp, 18h
>>>>> mov ebx, 0FFFFFFFFh
>>>>> cmp edi, 2
>>>>> jnz loc_100000F29
>>>>> mov rdi, [rsi+8] ; char *
>>>>> xor r14d, r14d
>>>>> xor esi, esi ; char **
>>>>> mov edx, 0Ah ; int
>>>>> call _strtol
>>>>> mov r15, rax
>>>>> shl rax, 20h
>>>>> mov rsi, offset __mh_execute_header
>>>>> add rsi, rax
>>>>> sar rsi, 20h ; size_t
>>>>> mov edi, 4 ; size_t
>>>>> call _calloc
>>>>> lea edx, [r15-1]
>>>>> movsxd r8, edx
>>>>> mov ecx, r15d
>>>>> add ecx, 0FFFFFFFEh
>>>>> js loc_100000DFA
>>>>> test r15d, r15d
>>>>> mov r11d, [rax+r8*4]
>>>>> jle loc_100000EAE
>>>>> mov ecx, r15d
>>>>> add ecx, 0FFFFFFFEh
>>>>> mov [rsp+48h+var_34], ecx
>>>>> movsxd rcx, ecx
>>>>> lea rcx, [rax+rcx*4]
>>>>> mov [rsp+48h+var_40], rcx
>>>>> lea rcx, [rax+4]
>>>>> mov [rsp+48h+var_48], rcx
>>>>> xor r14d, r14d
>>>>> jmp short loc_100000D33
>>>>> ; ---------------------------------------------------------------------------
>>>>> align 10h
>>>>>
>>>>> loc_100000D30: ; CODE XREF: _main+129 j
>>>>> ; _main+131 j ...
>>>>> add r14d, ebx
>>>>>
>>>>> loc_100000D33: ; CODE XREF: _main+92 j
>>>>> cmp r11d, edx
>>>>> lea edi, [r11+1]
>>>>> mov [rax+r8*4], edi
>>>>> mov rcx, [rsp+48h+var_40]
>>>>> mov esi, [rsp+48h+var_34]
>>>>> mov r11d, edi
>>>>> jl short loc_100000D84
>>>>> nop dword ptr [rax+00h]
>>>>>
>>>>> loc_100000D50: ; CODE XREF: _main+DA j
>>>>> mov edi, [rcx]
>>>>> lea ebp, [rdi+1]
>>>>> mov [rcx], ebp
>>>>> cmp edi, edx
>>>>> jl short loc_100000D71
>>>>> mov dword ptr [rcx], 0
>>>>> add rcx, 0FFFFFFFFFFFFFFFCh
>>>>> test esi, esi
>>>>> lea esi, [rsi-1]
>>>>> jg short loc_100000D50
>>>>> jmp loc_100000F0E
>>>>> ; ---------------------------------------------------------------------------
>>>>>
>>>>> loc_100000D71: ; CODE XREF: _main+C9 j
>>>>> test esi, esi
>>>>> js loc_100000F0E
>>>>> mov dword ptr [rax+r8*4], 0
>>>>> xor r11d, r11d
>>>>>
>>>>> loc_100000D84: ; CODE XREF: _main+BA j
>>>>> cmp r15d, 1
>>>>> mov esi, 0
>>>>> mov r9, [rsp+48h+var_48]
>>>>> mov r12d, 1
>>>>> jle short loc_100000DF0
>>>>>
>>>>> loc_100000D99: ; CODE XREF: _main+15E j
>>>>> mov r10d, [rax+rsi*4]
>>>>> mov ecx, 0FFFFFFFFh
>>>>> mov edi, 1
>>>>> mov r13, r9
>>>>> nop word ptr [rax+rax+00h]
>>>>>
>>>>> loc_100000DB0: ; CODE XREF: _main+14F j
>>>>> xor ebx, ebx
>>>>> mov ebp, r10d
>>>>> sub ebp, [r13+0]
>>>>> jz loc_100000D30
>>>>> cmp ecx, ebp
>>>>> jz loc_100000D30
>>>>> cmp edi, ebp
>>>>> jz loc_100000D30
>>>>> add r13, 4
>>>>> inc rdi
>>>>> dec ecx
>>>>> mov ebx, edi
>>>>> add ebx, esi
>>>>> cmp ebx, r15d
>>>>> jl short loc_100000DB0
>>>>> inc r12
>>>>> add r9, 4
>>>>> inc rsi
>>>>> cmp r12d, r15d
>>>>> jl short loc_100000D99
>>>>>
>>>>> loc_100000DF0: ; CODE XREF: _main+107 j
>>>>> mov ebx, 1
>>>>> jmp loc_100000D30
>>>>> ; ---------------------------------------------------------------------------
>>>>>
>>>>> loc_100000DFA: ; CODE XREF: _main+5E j
>>>>> mov ecx, [rax+r8*4]
>>>>> lea r9d, [rcx+1]
>>>>> mov [rax+r8*4], r9d
>>>>> cmp ecx, r8d
>>>>> jge loc_100000F0E
>>>>> lea r12, [rax+4]
>>>>> xor r14d, r14d
>>>>> db 2Eh
>>>>> nop word ptr [rax+rax+00000000h]
>>>>>
>>>>> loc_100000E20: ; CODE XREF: _main+216 j
>>>>> test r15d, r15d
>>>>> setle cl
>>>>> cmp r15d, 2
>>>>> jl short loc_100000E90
>>>>> test cl, cl
>>>>> mov r13d, 0
>>>>> mov r11, r12
>>>>> mov r10d, 1
>>>>> jnz short loc_100000E90
>>>>>
>>>>> loc_100000E3F: ; CODE XREF: _main+1F0 j
>>>>> mov edi, [rax+r13*4]
>>>>> mov edx, 0FFFFFFFFh
>>>>> mov ecx, 1
>>>>> mov rsi, r11
>>>>>
>>>>> loc_100000E50: ; CODE XREF: _main+1E1 j
>>>>> xor ebx, ebx
>>>>> mov ebp, edi
>>>>> sub ebp, [rsi]
>>>>> jz short loc_100000E95
>>>>> cmp edx, ebp
>>>>> jz short loc_100000E95
>>>>> cmp ecx, ebp
>>>>> jz short loc_100000E95
>>>>> add rsi, 4
>>>>> inc rcx
>>>>> dec edx
>>>>> mov ebx, ecx
>>>>> add ebx, r13d
>>>>> cmp ebx, r15d
>>>>> jl short loc_100000E50
>>>>> inc r10
>>>>> add r11, 4
>>>>> inc r13
>>>>> cmp r10d, r15d
>>>>> jl short loc_100000E3F
>>>>> db 66h, 66h, 66h, 66h, 2Eh
>>>>> nop word ptr [rax+rax+00000000h]
>>>>>
>>>>> loc_100000E90: ; CODE XREF: _main+19A j
>>>>> ; _main+1AD j
>>>>> mov ebx, 1
>>>>>
>>>>> loc_100000E95: ; CODE XREF: _main+1C6 j
>>>>> ; _main+1CA j ...
>>>>> add r14d, ebx
>>>>> cmp r9d, r8d
>>>>> lea ecx, [r9+1]
>>>>> mov [rax+r8*4], ecx
>>>>> mov r9d, ecx
>>>>> jl loc_100000E20
>>>>> jmp short loc_100000F0E
>>>>> ; ---------------------------------------------------------------------------
>>>>>
>>>>> loc_100000EAE: ; CODE XREF: _main+6B j
>>>>> add r15d, 0FFFFFFFEh
>>>>> movsxd rcx, r15d
>>>>> lea rcx, [rax+rcx*4]
>>>>> xor r14d, r14d
>>>>> jmp short loc_100000EC6
>>>>> ; ---------------------------------------------------------------------------
>>>>> align 20h
>>>>>
>>>>> loc_100000EC0: ; CODE XREF: _main+247 j
>>>>> ; _main+27C j
>>>>> inc r14d
>>>>> mov r11d, ebp
>>>>>
>>>>> loc_100000EC6: ; CODE XREF: _main+22C j
>>>>> lea ebp, [r11+1]
>>>>> mov [rax+r8*4], ebp
>>>>> cmp r11d, r8d
>>>>> mov rsi, rcx
>>>>> mov edi, r15d
>>>>> jl short loc_100000EC0
>>>>> nop dword ptr [rax+00000000h]
>>>>>
>>>>> loc_100000EE0: ; CODE XREF: _main+26A j
>>>>> mov ebp, [rsi]
>>>>> lea ebx, [rbp+1]
>>>>> mov [rsi], ebx
>>>>> cmp ebp, edx
>>>>> jl short loc_100000EFE
>>>>> mov dword ptr [rsi], 0
>>>>> add rsi, 0FFFFFFFFFFFFFFFCh
>>>>> test edi, edi
>>>>> lea edi, [rdi-1]
>>>>> jg short loc_100000EE0
>>>>> jmp short loc_100000F0E
>>>>> ; ---------------------------------------------------------------------------
>>>>>
>>>>> loc_100000EFE: ; CODE XREF: _main+259 j
>>>>> test edi, edi
>>>>> js short loc_100000F0E
>>>>> mov dword ptr [rax+r8*4], 0
>>>>> xor ebp, ebp
>>>>> jmp short loc_100000EC0
>>>>> ; ---------------------------------------------------------------------------
>>>>>
>>>>> loc_100000F0E: ; CODE XREF: _main+DC j
>>>>> ; _main+E3 j ...
>>>>> mov rdi, rax ; void *
>>>>> call _free
>>>>> lea rdi, aDSolutions ; "%d solutions\n"
>>>>> xor ebx, ebx
>>>>> xor eax, eax
>>>>> mov esi, r14d
>>>>> call _printf
>>>>>
>>>>> loc_100000F29: ; CODE XREF: _main+16 j
>>>>> mov eax, ebx
>>>>> add rsp, 18h
>>>>> pop rbx
>>>>> pop r12
>>>>> pop r13
>>>>> pop r14
>>>>> pop r15
>>>>> pop rbp
>>>>> retn
>>>>> _main endp
>>>>> ```
>>>>>
>>>>> clang 3.6's result:
>>>>>
>>>>> ```
>>>>> public _main
>>>>> _main proc near
>>>>>
>>>>> var_60 = qword ptr -60h
>>>>> var_58 = qword ptr -58h
>>>>> var_50 = qword ptr -50h
>>>>> var_48 = qword ptr -48h
>>>>> var_40 = qword ptr -40h
>>>>> var_38 = qword ptr -38h
>>>>>
>>>>> push rbp
>>>>> push r15
>>>>> push r14
>>>>> push r13
>>>>> push r12
>>>>> push rbx
>>>>> sub rsp, 38h
>>>>> mov ebx, 0FFFFFFFFh
>>>>> cmp edi, 2
>>>>> jnz loc_100000F23
>>>>> mov rbx, offset __mh_execute_header
>>>>> mov rdi, [rsi+8] ; char *
>>>>> xor r13d, r13d
>>>>> xor esi, esi ; char **
>>>>> mov edx, 0Ah ; int
>>>>> call _strtol
>>>>> mov r14, rax
>>>>> shl rax, 20h
>>>>> mov [rsp+68h+var_38], rax
>>>>> lea rsi, [rax+rbx]
>>>>> sar rsi, 20h ; size_t
>>>>> mov edi, 4 ; size_t
>>>>> call _calloc
>>>>> lea r11d, [r14-1]
>>>>> movsxd r12, r11d
>>>>> mov [rsp+68h+var_40], r12
>>>>> movsxd rcx, r14d
>>>>> mov [rsp+68h+var_50], rcx
>>>>> add ecx, 0FFFFFFFEh
>>>>> js loc_100000E1A
>>>>> mov ecx, r14d
>>>>> add ecx, 0FFFFFFFEh
>>>>> movsxd rcx, ecx
>>>>> inc rcx
>>>>> mov [rsp+68h+var_58], rcx
>>>>> mov rcx, rax
>>>>> add rcx, 4
>>>>> mov [rsp+68h+var_60], rcx
>>>>> xor ebp, ebp
>>>>> jmp short loc_100000D17
>>>>> ; ---------------------------------------------------------------------------
>>>>> align 10h
>>>>>
>>>>> loc_100000D10: ; CODE XREF: _main+15B j
>>>>> ; _main+163 j ...
>>>>> mov rbp, [rsp+68h+var_48]
>>>>> add ebp, edi
>>>>>
>>>>> loc_100000D17: ; CODE XREF: _main+93 j
>>>>> cmp r13d, r11d
>>>>> lea edx, [r13+1]
>>>>> mov [rax+r12*4], edx
>>>>> mov rcx, [rsp+68h+var_58]
>>>>> mov r13d, edx
>>>>> jl short loc_100000D6B
>>>>> nop dword ptr [rax+00h]
>>>>>
>>>>> loc_100000D30: ; CODE XREF: _main+DE j
>>>>> mov edx, [rax+rcx*4-4]
>>>>> lea esi, [rdx+1]
>>>>> mov [rax+rcx*4-4], esi
>>>>> cmp edx, r11d
>>>>> jl short loc_100000D60
>>>>> mov dword ptr [rax+rcx*4-4], 0
>>>>> dec rcx
>>>>> test rcx, rcx
>>>>> jg short loc_100000D30
>>>>> jmp loc_100000F09
>>>>> ; ---------------------------------------------------------------------------
>>>>> align 20h
>>>>>
>>>>> loc_100000D60: ; CODE XREF: _main+CE j
>>>>> mov dword ptr [rax+r12*4], 0
>>>>> xor r13d, r13d
>>>>>
>>>>> loc_100000D6B: ; CODE XREF: _main+BA j
>>>>> mov [rsp+68h+var_48], rbp
>>>>> test r14d, r14d
>>>>> setle cl
>>>>> mov rdx, offset __mh_execute_header
>>>>> lea rdx, [rdx+1]
>>>>> cmp [rsp+68h+var_38], rdx
>>>>> jl loc_100000E10
>>>>> test cl, cl
>>>>> mov edx, 0
>>>>> mov r10, [rsp+68h+var_60]
>>>>> mov r9d, 1
>>>>> jnz short loc_100000E10
>>>>>
>>>>> loc_100000DA3: ; CODE XREF: _main+195 j
>>>>> mov esi, [rax+rdx*4]
>>>>> mov r15d, 0FFFFFFFFh
>>>>> mov r8d, 1
>>>>> mov rcx, r10
>>>>> db 66h, 66h, 2Eh
>>>>> nop dword ptr [rax+rax+00000000h]
>>>>>
>>>>> loc_100000DC0: ; CODE XREF: _main+184 j
>>>>> mov ebx, [rcx]
>>>>> mov ebp, esi
>>>>> sub ebp, ebx
>>>>> xor edi, edi
>>>>> cmp r8d, ebp
>>>>> jz loc_100000D10
>>>>> cmp esi, ebx
>>>>> jz loc_100000D10
>>>>> cmp r15d, ebp
>>>>> jz loc_100000D10
>>>>> add rcx, 4
>>>>> inc r8
>>>>> dec r15d
>>>>> mov edi, r8d
>>>>> add edi, edx
>>>>> cmp edi, r14d
>>>>> jl short loc_100000DC0
>>>>> inc r9
>>>>> add r10, 4
>>>>> inc rdx
>>>>> cmp r9, [rsp+68h+var_50]
>>>>> jl short loc_100000DA3
>>>>> nop word ptr [rax+rax+00000000h]
>>>>>
>>>>> loc_100000E10: ; CODE XREF: _main+119 j
>>>>> ; _main+131 j
>>>>> mov edi, 1
>>>>> jmp loc_100000D10
>>>>> ; ---------------------------------------------------------------------------
>>>>>
>>>>> loc_100000E1A: ; CODE XREF: _main+6E j
>>>>> test r14d, r14d
>>>>> jle loc_100000F00
>>>>> mov dword ptr [rax+r12*4], 1
>>>>> xor ebp, ebp
>>>>> cmp r14d, 2
>>>>> jl loc_100000F09
>>>>> mov rcx, rax
>>>>> add rcx, 4
>>>>> mov [rsp+68h+var_48], rcx
>>>>> xor ebp, ebp
>>>>> mov r15d, 1
>>>>> nop dword ptr [rax+rax+00h]
>>>>>
>>>>> loc_100000E50: ; CODE XREF: _main+288 j
>>>>> mov rbx, rbp
>>>>> mov rcx, offset __mh_execute_header
>>>>> cmp [rsp+68h+var_38], rcx
>>>>> mov edx, 0
>>>>> mov r13, [rsp+68h+var_48]
>>>>> mov r8d, 1
>>>>> mov r9d, 1
>>>>> jle short loc_100000EE0
>>>>>
>>>>> loc_100000E7A: ; CODE XREF: _main+25A j
>>>>> mov r12d, [rax+rdx*4]
>>>>> mov edi, 0FFFFFFFFh
>>>>> mov ecx, 1
>>>>> mov rsi, r13
>>>>> nop dword ptr [rax+rax+00h]
>>>>>
>>>>> loc_100000E90: ; CODE XREF: _main+249 j
>>>>> mov r10d, [rsi]
>>>>> mov ebp, r12d
>>>>> sub ebp, r10d
>>>>> xor r9d, r9d
>>>>> cmp ecx, ebp
>>>>> jz short loc_100000EE0
>>>>> cmp r12d, r10d
>>>>> jz short loc_100000EE0
>>>>> cmp edi, ebp
>>>>> jz short loc_100000EE0
>>>>> add rsi, 4
>>>>> inc rcx
>>>>> dec edi
>>>>> mov ebp, ecx
>>>>> add ebp, edx
>>>>> cmp ebp, r14d
>>>>> jl short loc_100000E90
>>>>> inc r8
>>>>> add r13, 4
>>>>> inc rdx
>>>>> cmp r8, [rsp+68h+var_50]
>>>>> jl short loc_100000E7A
>>>>> mov r9d, 1
>>>>> db 66h, 66h, 66h, 66h, 2Eh
>>>>> nop word ptr [rax+rax+00000000h]
>>>>>
>>>>> loc_100000EE0: ; CODE XREF: _main+208 j
>>>>> ; _main+22E j ...
>>>>> mov rbp, rbx
>>>>> add ebp, r9d
>>>>> cmp r15d, r11d
>>>>> lea ecx, [r15+1]
>>>>> mov rdx, [rsp+68h+var_40]
>>>>> mov [rax+rdx*4], ecx
>>>>> mov r15d, ecx
>>>>> jl loc_100000E50
>>>>> jmp short loc_100000F09
>>>>> ; ---------------------------------------------------------------------------
>>>>>
>>>>> loc_100000F00: ; CODE XREF: _main+1AD j
>>>>> xor ebp, ebp
>>>>> test r11d, r11d
>>>>> cmovns ebp, r11d
>>>>>
>>>>> loc_100000F09: ; CODE XREF: _main+E0 j
>>>>> ; _main+1C1 j ...
>>>>> mov rdi, rax ; void *
>>>>> call _free
>>>>> lea rdi, aDSolutions ; "%d solutions\n"
>>>>> xor ebx, ebx
>>>>> xor eax, eax
>>>>> mov esi, ebp
>>>>> call _printf
>>>>>
>>>>> loc_100000F23: ; CODE XREF: _main+16 j
>>>>> mov eax, ebx
>>>>> add rsp, 38h
>>>>> pop rbx
>>>>> pop r12
>>>>> pop r13
>>>>> pop r14
>>>>> pop r15
>>>>> pop rbp
>>>>> retn
>>>>> _main endp
>>>>> ```
>>>>>
>>>>> gcc-4.9.2's result:
>>>>> ```
>>>>>
>>>>> _main proc near
>>>>>
>>>>> var_48 = qword ptr -48h
>>>>> var_40 = dword ptr -40h
>>>>> var_3C = dword ptr -3Ch
>>>>>
>>>>> cmp edi, 2
>>>>> jz short loc_100000D69
>>>>> or eax, 0FFFFFFFFh
>>>>> retn
>>>>> ; ---------------------------------------------------------------------------
>>>>>
>>>>> loc_100000D69: ; CODE XREF: _main+3 j
>>>>> push r15
>>>>> mov edx, 0Ah ; int
>>>>> push r14
>>>>> push r13
>>>>> push r12
>>>>> push rbp
>>>>> push rbx
>>>>> sub rsp, 18h
>>>>> mov rdi, [rsi+8] ; char *
>>>>> xor esi, esi ; char **
>>>>> call _strtol
>>>>> mov edi, 4 ; size_t
>>>>> lea esi, [rax+1]
>>>>> mov r14, rax
>>>>> mov ebx, eax
>>>>> lea r15d, [r14-2]
>>>>> movsxd rsi, esi ; size_t
>>>>> call _calloc
>>>>> mov [rsp+48h+var_3C], 0
>>>>> mov rdi, rax ; void *
>>>>> lea eax, [r14-1]
>>>>> cdqe
>>>>> lea r13, [rdi+rax*4]
>>>>> movsxd rax, r15d
>>>>> mov ebp, [r13+0]
>>>>> shl rax, 2
>>>>> lea r12, [rdi+rax]
>>>>> lea rax, [rdi+rax-4]
>>>>> mov [rsp+48h+var_48], rax
>>>>> mov eax, r14d
>>>>> lea r14d, [r14+1]
>>>>> nop word ptr [rax+rax+00h]
>>>>> nop word ptr [rax+rax+00h]
>>>>>
>>>>> loc_100000DE0: ; CODE XREF: _main+12B j
>>>>> ; _main+155 j ...
>>>>> add ebp, 1
>>>>> cmp ebx, ebp
>>>>> mov [r13+0], ebp
>>>>> jg short loc_100000E62
>>>>> test r15d, r15d
>>>>> js short loc_100000E33
>>>>> mov ecx, [r12]
>>>>> lea edx, [rcx+1]
>>>>> cmp ebx, edx
>>>>> mov [r12], edx
>>>>> jg short loc_100000E58
>>>>> mov r8, r12
>>>>> mov rcx, [rsp+48h+var_48]
>>>>> mov esi, r15d
>>>>> jmp short loc_100000E24
>>>>> ; ---------------------------------------------------------------------------
>>>>> align 10h
>>>>>
>>>>> loc_100000E10: ; CODE XREF: _main+D1 j
>>>>> mov edx, [rcx]
>>>>> sub r8, 4
>>>>> sub rcx, 4
>>>>> add edx, 1
>>>>> mov [rcx+4], edx
>>>>> cmp ebx, edx
>>>>> jg short loc_100000E58
>>>>>
>>>>> loc_100000E24: ; CODE XREF: _main+A9 j
>>>>> sub esi, 1
>>>>> mov dword ptr [r8], 0
>>>>> cmp esi, 0FFFFFFFFh
>>>>> jnz short loc_100000E10
>>>>>
>>>>> loc_100000E33: ; CODE XREF: _main+8E j
>>>>> call _free
>>>>> mov esi, [rsp+48h+var_3C]
>>>>> add rsp, 18h
>>>>> xor eax, eax
>>>>> pop rbx
>>>>> lea rdi, aDSolutions ; "%d solutions\n"
>>>>> pop rbp
>>>>> pop r12
>>>>> pop r13
>>>>> pop r14
>>>>> pop r15
>>>>> jmp _printf
>>>>> ; ---------------------------------------------------------------------------
>>>>>
>>>>> loc_100000E58: ; CODE XREF: _main+9D j
>>>>> ; _main+C2 j
>>>>> mov dword ptr [r13+0], 0
>>>>> xor ebp, ebp
>>>>>
>>>>> loc_100000E62: ; CODE XREF: _main+89 j
>>>>> test ebx, ebx
>>>>> jle loc_100000EE6
>>>>> lea r11, [rdi+8]
>>>>> xor r10d, r10d
>>>>>
>>>>> loc_100000E71: ; CODE XREF: _main+184 j
>>>>> add r10d, 1
>>>>> cmp r10d, eax
>>>>> jz short loc_100000EE6
>>>>> mov r8d, [r11-8]
>>>>> mov edx, r8d
>>>>> sub edx, [r11-4]
>>>>> add edx, 1
>>>>> cmp edx, 2
>>>>> jbe loc_100000DE0
>>>>> mov r9d, r14d
>>>>> mov rcx, r11
>>>>> mov edx, 1
>>>>> mov [rsp+48h+var_40], r10d
>>>>> sub r9d, r10d
>>>>> jmp short loc_100000ED3
>>>>> ; ---------------------------------------------------------------------------
>>>>> align 10h
>>>>>
>>>>> loc_100000EB0: ; CODE XREF: _main+179 j
>>>>> mov esi, r8d
>>>>> sub esi, [rcx]
>>>>> jz loc_100000DE0
>>>>> mov r10d, esi
>>>>> add rcx, 4
>>>>> add r10d, edx
>>>>> jz loc_100000DE0
>>>>> cmp esi, edx
>>>>> jz loc_100000DE0
>>>>>
>>>>> loc_100000ED3: ; CODE XREF: _main+144 j
>>>>> add edx, 1
>>>>> cmp edx, r9d
>>>>> jnz short loc_100000EB0
>>>>> mov r10d, [rsp+48h+var_40]
>>>>> add r11, 4
>>>>> jmp short loc_100000E71
>>>>> ; ---------------------------------------------------------------------------
>>>>>
>>>>> loc_100000EE6: ; CODE XREF: _main+104 j
>>>>> ; _main+118 j
>>>>> add [rsp+48h+var_3C], 1
>>>>> jmp loc_100000DE0
>>>>> _main endp
>>>>> ```
>>>>>
>>>>> MSVC 10.0's result:
>>>>>
>>>>> ```
>>>>>
>>>>> _main proc near ; CODE XREF: ___tmainCRTStartup+106 p
>>>>>
>>>>> var_80 = dword ptr -80h
>>>>> var_7C = dword ptr -7Ch
>>>>> var_78 = dword ptr -78h
>>>>> var_74 = dword ptr -74h
>>>>> var_70 = dword ptr -70h
>>>>> var_6C = dword ptr -6Ch
>>>>> var_68 = dword ptr -68h
>>>>> var_64 = dword ptr -64h
>>>>> var_60 = dword ptr -60h
>>>>> var_5C = dword ptr -5Ch
>>>>> argc = dword ptr 8
>>>>> argv = dword ptr 0Ch
>>>>> envp = dword ptr 10h
>>>>>
>>>>> push ebp
>>>>> mov ebp, esp
>>>>> and esp, 0FFFFFF80h
>>>>> push esi
>>>>> push edi
>>>>> push ebx
>>>>> sub esp, 74h
>>>>> push 3
>>>>> call sub_4080F0
>>>>> add esp, 4
>>>>> stmxcsr [esp+80h+var_80]
>>>>> or [esp+80h+var_80], 8000h
>>>>> ldmxcsr [esp+80h+var_80]
>>>>> cmp [ebp+argc], 2
>>>>> jz short loc_40103A
>>>>> mov eax, 0FFFFFFFFh
>>>>> add esp, 74h
>>>>> pop ebx
>>>>> pop edi
>>>>> pop esi
>>>>> mov esp, ebp
>>>>> pop ebp
>>>>> retn
>>>>> ; ---------------------------------------------------------------------------
>>>>>
>>>>> loc_40103A: ; CODE XREF: _main+29 j
>>>>> call ds:GetTickCount
>>>>> mov esi, eax
>>>>> mov eax, [ebp+argv]
>>>>> push dword ptr [eax+4] ; char *
>>>>> call _atoi
>>>>> mov edi, eax
>>>>> lea eax, [edi+1]
>>>>> push eax ; size_t
>>>>> push 4 ; size_t
>>>>> call _calloc
>>>>> add esp, 0Ch
>>>>> mov ecx, [eax+edi*4-4]
>>>>> lea edx, [edi-1]
>>>>> mov [esp+80h+var_6C], ecx
>>>>> xor ebx, ebx
>>>>> mov [esp+80h+var_7C], ebx
>>>>> lea ecx, [eax+edi*4]
>>>>> mov [esp+80h+var_74], ecx
>>>>> lea ecx, [edi-2]
>>>>> mov [esp+80h+var_70], ecx
>>>>> mov [esp+80h+var_60], edx
>>>>> mov [esp+80h+var_80], esi
>>>>> mov ecx, [esp+80h+var_6C]
>>>>>
>>>>> loc_401087: ; CODE XREF: _main+142 j
>>>>> ; _main+193 j
>>>>> mov edx, [esp+80h+var_60]
>>>>> inc ecx
>>>>> mov [eax+edi*4-4], ecx
>>>>> cmp edi, [eax+edx*4]
>>>>> jg short loc_4010DC
>>>>> mov esi, [esp+80h+var_70]
>>>>> test esi, esi
>>>>> js short loc_4010CE
>>>>> xor edx, edx
>>>>> mov [esp+80h+var_78], eax
>>>>> xor ebx, ebx
>>>>> mov eax, [esp+80h+var_74]
>>>>>
>>>>> loc_4010A9: ; CODE XREF: _main+C8 j
>>>>> mov ecx, [eax+ebx*4-8]
>>>>> inc ecx
>>>>> cmp ecx, edi
>>>>> jl loc_40117A
>>>>> inc edx
>>>>> lea esi, [ebx+edi-3]
>>>>> mov dword ptr [eax+ebx*4-8], 0
>>>>> dec ebx
>>>>> cmp edx, [esp+80h+var_60]
>>>>> jb short loc_4010A9
>>>>> mov eax, [esp+80h+var_78]
>>>>>
>>>>> loc_4010CE: ; CODE XREF: _main+9B j
>>>>> ; _main+186 j
>>>>> test esi, esi
>>>>> jl short loc_401147
>>>>> mov dword ptr [eax+edi*4-4], 0
>>>>> xor ecx, ecx
>>>>>
>>>>> loc_4010DC: ; CODE XREF: _main+93 j
>>>>> test edi, edi
>>>>> jle short loc_40113E
>>>>> mov [esp+80h+var_6C], ecx
>>>>> xor edx, edx
>>>>> mov [esp+80h+var_5C], edi
>>>>>
>>>>> loc_4010EA: ; CODE XREF: _main+132 j
>>>>> lea ecx, [edx+1]
>>>>> mov ebx, ecx
>>>>> mov esi, ebx
>>>>> cmp ecx, [esp+80h+var_5C]
>>>>> jge short loc_401130
>>>>> mov edx, [eax+edx*4]
>>>>> mov edi, 1
>>>>> mov [esp+80h+var_64], esi
>>>>> mov [esp+80h+var_68], ecx
>>>>>
>>>>> loc_401107: ; CODE XREF: _main+122 j
>>>>> mov esi, [eax+ebx*4]
>>>>> cmp edx, esi
>>>>> jz short loc_40118B
>>>>> sub esi, edx
>>>>> mov ecx, esi
>>>>> neg ecx
>>>>> cmp edi, ecx
>>>>> jz short loc_40118B
>>>>> cmp esi, edi
>>>>> jz short loc_40118B
>>>>> inc ebx
>>>>> inc edi
>>>>> cmp ebx, [esp+80h+var_5C]
>>>>> jl short loc_401107
>>>>> mov ecx, [esp+80h+var_68]
>>>>> mov esi, [esp+80h+var_64]
>>>>> cmp ecx, [esp+80h+var_5C]
>>>>>
>>>>> loc_401130: ; CODE XREF: _main+F5 j
>>>>> mov edx, esi
>>>>> jl short loc_4010EA
>>>>> xchg ax, ax
>>>>> mov ecx, [esp+80h+var_6C]
>>>>> mov edi, [esp+80h+var_5C]
>>>>>
>>>>> loc_40113E: ; CODE XREF: _main+DE j
>>>>> inc [esp+80h+var_7C]
>>>>> jmp loc_401087
>>>>> ; ---------------------------------------------------------------------------
>>>>>
>>>>> loc_401147: ; CODE XREF: _main+D0 j
>>>>> mov ebx, [esp+80h+var_7C]
>>>>> mov esi, [esp+80h+var_80]
>>>>> push eax ; void *
>>>>> call _free
>>>>> add esp, 4
>>>>> call ds:GetTickCount
>>>>> sub eax, esi
>>>>> push eax
>>>>> push ebx
>>>>> push offset aDSolutionsInDM ; "%d solutions in %d msecs.\n"
>>>>> call _printf
>>>>> xor eax, eax
>>>>> add esp, 80h
>>>>> pop ebx
>>>>> pop edi
>>>>> pop esi
>>>>> mov esp, ebp
>>>>> pop ebp
>>>>> retn
>>>>> ; ---------------------------------------------------------------------------
>>>>>
>>>>> loc_40117A: ; CODE XREF: _main+B0 j
>>>>> mov edx, [esp+80h+var_74]
>>>>> mov eax, [esp+80h+var_78]
>>>>> mov [edx+ebx*4-8], ecx
>>>>> jmp loc_4010CE
>>>>> ; ---------------------------------------------------------------------------
>>>>>
>>>>> loc_40118B: ; CODE XREF: _main+10C j
>>>>> ; _main+116 j ...
>>>>> mov ecx, [esp+80h+var_6C]
>>>>> mov edi, [esp+80h+var_5C]
>>>>> jmp loc_401087
>>>>> _main endp
>>>>> ```
>>>>> _______________________________________________
>>>>> LLVM Developers mailing list
>>>>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu
>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
More information about the llvm-dev
mailing list