[LLVMdev] trunk's optimizer generates slower code than 3.5
Jack Howarth
howarth.mailing.lists at gmail.com
Sat Feb 14 09:31:46 PST 2015
Oops. I misspoke. The 22% performance regression is in fact eliminated
in current llvm/clang trunk. Hopefully this is due to a single fix
that can be back ported rather than some large change in the code.
On Sat, Feb 14, 2015 at 12:18 PM, Jack Howarth
<howarth.mailing.lists at gmail.com> wrote:
> The same 22% performance regression also exists in current llvm/clang
> trunk for the SciMark2 Sparse matmult benchmark.
>
> On Sat, Feb 14, 2015 at 12:11 PM, Jack Howarth
> <howarth.mailing.lists at gmail.com> wrote:
>> Using the SciMark 2.0 code from
>> http://math.nist.gov/scimark2/scimark2_1c.zip compiled with the
>> same...
>>
>> make CFLAGS="-O3 -march=native"
>>
>> I am able to reproduce the 22% performance regression in the run time
>> of the Sparse matmult benchmark.
>> For 10 runs of the scimark2 benechmark, I get 998.439+/-0.4828 with
>> the release llvm clang 3.5.1 compiler
>> and 1217.363+/-1.1004 for the current clang 3.6svn from 3.6 branch. Not good.
>> Jack
>>
>> On Sat, Feb 14, 2015 at 11:19 AM, Jack Howarth
>> <howarth.mailing.lists at gmail.com> wrote:
>>> Do any of the build-bots routinely run the SciMark v2.0 benchmark?
>>> If so, might not an examination of those logs reveal the commit range
>>> at which the optimizations in that benchmark degraded?
>>> Jack
>>>
>>> On Sat, Feb 14, 2015 at 11:13 AM, Jack Howarth
>>> <howarth.mailing.lists at gmail.com> wrote:
>>>> The regressions in the performance of generated code, introduced
>>>> by the llvm 3.6 release, don't seem to be limited to this 8 queens
>>>> puzzle" solver test case. See...
>>>>
>>>> http://www.phoronix.com/scan.php?page=article&item=llvm-clang-3.5-3.6-rc1&num=1
>>>>
>>>> where a bit hit in the performance of the Sparse Matrix Multiply test
>>>> of the SciMark v2.0 benchmark was observed as well as others.
>>>> Do you really want to release 3.6 with this level of performance regression?
>>>> Jack
>>>>
>>>> On Fri, Feb 13, 2015 at 2:47 PM, Jack Howarth
>>>> <howarth.mailing.lists at gmail.com> wrote:
>>>>> Also confirmed with the llvm 3.5.1 release and the llvm 3.6 release
>>>>> branch on x86_64-apple-darwin14...
>>>>>
>>>>> % clang-3.5 -O3 -mssse3 -fomit-frame-pointer -fno-stack-protector
>>>>> -fno-exceptions -o 8 8.c
>>>>> % time ./8 9
>>>>> 352 solutions
>>>>> 3.603u 0.002s 0:03.60 100.0% 0+0k 0+0io 2pf+0w
>>>>> % time ./8 10
>>>>> 724 solutions
>>>>> 104.217u 0.059s 1:44.30 99.9% 0+0k 0+0io 2pf+0w
>>>>>
>>>>> % clang-3.6 -O3 -mssse3 -fomit-frame-pointer -fno-stack-protector
>>>>> -fno-exceptions -o 8 8.c
>>>>> % time ./8 9
>>>>> 352 solutions
>>>>> 4.050u 0.001s 0:04.05 100.0% 0+0k 0+0io 2pf+0w
>>>>> % time ./8 10
>>>>> 724 solutions
>>>>> 114.808u 0.041s 1:54.86 99.9% 0+0k 0+0io 2pf+0w
>>>>>
>>>>> On Fri, Feb 13, 2015 at 3:37 AM, 191919 <191919 at gmail.com> wrote:
>>>>>> I submitted the problem report to clang's bugzilla but no one seems to
>>>>>> care so I have to send it to the mailing list.
>>>>>>
>>>>>> clang 3.7 svn (trunk 229055 as the time I was to report this problem)
>>>>>> generates slower code than 3.5 (Apple LLVM version 6.0
>>>>>> (clang-600.0.56) (based on LLVM 3.5svn)) for the following code.
>>>>>>
>>>>>> It is a "8 queens puzzle" solver written as an educational example. As
>>>>>> compiled by both clang 3.5 and 3.7, it gave the correct answer, but
>>>>>> clang 3.5 generates code which runs 20% faster than 3.6/3.7.
>>>>>>
>>>>>> ##########################################
>>>>>> # clang 3.5 which comes with Xcode 6.1.1
>>>>>> ##########################################
>>>>>> $ clang -O3 -mssse3 -fomit-frame-pointer -fno-stack-protector
>>>>>> -fno-exceptions -o 8 8.c
>>>>>> $ time ./8 9 # 9 queens
>>>>>> 352 solutions
>>>>>> $ time ./8 10 # 10 queens
>>>>>> ./8 9 1.63s user 0.00s system 99% cpu 1.632 total
>>>>>> 724 solutions
>>>>>> ./8 10 45.11s user 0.01s system 99% cpu 45.121 total
>>>>>>
>>>>>> ##########################################
>>>>>> # clang 3.7 svn trunk
>>>>>> ##########################################
>>>>>> $ /opt/bin/clang -O3 -mssse3 -fomit-frame-pointer -fno-stack-protector
>>>>>> -fno-exceptions -o 8 8.c
>>>>>> $ time ./8 9 # 9 queens
>>>>>> 352 solutions
>>>>>> ./8 9 2.07s user 0.00s system 99% cpu 2.078 total
>>>>>> $ time ./8 10 # 10 queens
>>>>>> 724 solutions
>>>>>> ./8 10 56.63s user 0.02s system 99% cpu 56.650 total
>>>>>>
>>>>>> The source code is below, I also attached the executable files as well
>>>>>> as the assembly code files for clang 3.5 and 3.6 by IDA.
>>>>>>
>>>>>> The performance is even worse when compiling as 32-bit code while
>>>>>> gcc-4.9.2 is not affected.
>>>>>>
>>>>>> ########## clang-3.5
>>>>>> $ clang -m32 -O3 -fomit-frame-pointer -fno-stack-protector
>>>>>> -fno-exceptions -o 8 8.c
>>>>>> $ time ./8 9
>>>>>> 352 solutions
>>>>>> ./8 9 1.95s user 0.00s system 99% cpu 1.950 total
>>>>>>
>>>>>> ########## clang-3.7
>>>>>> $ /opt/bin/clang -m32 -O3 -fomit-frame-pointer -fno-stack-protector
>>>>>> -fno-exceptions -o 8 8.c
>>>>>> $ time ./8 9
>>>>>> 352 solutions
>>>>>> ./8 9 2.48s user 0.00s system 99% cpu 2.480 total
>>>>>>
>>>>>> ######### gcc-4.9.2
>>>>>> $ /opt/bin/gcc -m32 -O3 -fomit-frame-pointer -fno-stack-protector
>>>>>> -fno-exceptions -o 8 8.c
>>>>>> $ time ./8 9
>>>>>> 352 solutions
>>>>>> ./8 9 1.44s user 0.00s system 99% cpu 1.442 total
>>>>>>
>>>>>>
>>>>>> ```
>>>>>> #include <stdio.h>
>>>>>> #include <stdlib.h>
>>>>>>
>>>>>> static inline int validate(int* a, int d)
>>>>>> {
>>>>>> int i, j, x;
>>>>>> for (i = 0; i < d; ++i)
>>>>>> {
>>>>>> for (j = i+1, x = 1; j < d; ++j, ++x)
>>>>>> {
>>>>>> const int d = a[i] - a[j];
>>>>>> if (d == 0 || d == -x || d == x) return 0;
>>>>>> }
>>>>>> }
>>>>>> return 1;
>>>>>> }
>>>>>>
>>>>>> static inline int solve(int d)
>>>>>> {
>>>>>> int r = 0;
>>>>>> int* a = (int*) calloc(sizeof(int), d+1);
>>>>>> int p = d - 1;
>>>>>>
>>>>>> for (;;)
>>>>>> {
>>>>>> a[p]++;
>>>>>>
>>>>>> if (a[p] > d-1)
>>>>>> {
>>>>>> int bp = p - 1;
>>>>>> while (bp >= 0)
>>>>>> {
>>>>>> a[bp]++;
>>>>>> if (a[bp] <= d-1) break;
>>>>>> a[bp] = 0;
>>>>>> --bp;
>>>>>> }
>>>>>> if (bp < 0)
>>>>>> break;
>>>>>> a[p] = 0;
>>>>>> }
>>>>>> if (validate(a, d))
>>>>>> {
>>>>>> ++r;
>>>>>> }
>>>>>> }
>>>>>>
>>>>>> free(a);
>>>>>> return r;
>>>>>> }
>>>>>>
>>>>>> int main(int argc, char** argv)
>>>>>> {
>>>>>> if (argc != 2) return -1;
>>>>>> int r = solve((int) strtol(argv[1], NULL, 10));
>>>>>> printf("%d solutions\n", r);
>>>>>> }
>>>>>> ```
>>>>>>
>>>>>> clang 3.5's result:
>>>>>>
>>>>>> ```
>>>>>> public _main
>>>>>> _main proc near
>>>>>>
>>>>>> var_48 = qword ptr -48h
>>>>>> var_40 = qword ptr -40h
>>>>>> var_34 = dword ptr -34h
>>>>>>
>>>>>> push rbp
>>>>>> push r15
>>>>>> push r14
>>>>>> push r13
>>>>>> push r12
>>>>>> push rbx
>>>>>> sub rsp, 18h
>>>>>> mov ebx, 0FFFFFFFFh
>>>>>> cmp edi, 2
>>>>>> jnz loc_100000F29
>>>>>> mov rdi, [rsi+8] ; char *
>>>>>> xor r14d, r14d
>>>>>> xor esi, esi ; char **
>>>>>> mov edx, 0Ah ; int
>>>>>> call _strtol
>>>>>> mov r15, rax
>>>>>> shl rax, 20h
>>>>>> mov rsi, offset __mh_execute_header
>>>>>> add rsi, rax
>>>>>> sar rsi, 20h ; size_t
>>>>>> mov edi, 4 ; size_t
>>>>>> call _calloc
>>>>>> lea edx, [r15-1]
>>>>>> movsxd r8, edx
>>>>>> mov ecx, r15d
>>>>>> add ecx, 0FFFFFFFEh
>>>>>> js loc_100000DFA
>>>>>> test r15d, r15d
>>>>>> mov r11d, [rax+r8*4]
>>>>>> jle loc_100000EAE
>>>>>> mov ecx, r15d
>>>>>> add ecx, 0FFFFFFFEh
>>>>>> mov [rsp+48h+var_34], ecx
>>>>>> movsxd rcx, ecx
>>>>>> lea rcx, [rax+rcx*4]
>>>>>> mov [rsp+48h+var_40], rcx
>>>>>> lea rcx, [rax+4]
>>>>>> mov [rsp+48h+var_48], rcx
>>>>>> xor r14d, r14d
>>>>>> jmp short loc_100000D33
>>>>>> ; ---------------------------------------------------------------------------
>>>>>> align 10h
>>>>>>
>>>>>> loc_100000D30: ; CODE XREF: _main+129 j
>>>>>> ; _main+131 j ...
>>>>>> add r14d, ebx
>>>>>>
>>>>>> loc_100000D33: ; CODE XREF: _main+92 j
>>>>>> cmp r11d, edx
>>>>>> lea edi, [r11+1]
>>>>>> mov [rax+r8*4], edi
>>>>>> mov rcx, [rsp+48h+var_40]
>>>>>> mov esi, [rsp+48h+var_34]
>>>>>> mov r11d, edi
>>>>>> jl short loc_100000D84
>>>>>> nop dword ptr [rax+00h]
>>>>>>
>>>>>> loc_100000D50: ; CODE XREF: _main+DA j
>>>>>> mov edi, [rcx]
>>>>>> lea ebp, [rdi+1]
>>>>>> mov [rcx], ebp
>>>>>> cmp edi, edx
>>>>>> jl short loc_100000D71
>>>>>> mov dword ptr [rcx], 0
>>>>>> add rcx, 0FFFFFFFFFFFFFFFCh
>>>>>> test esi, esi
>>>>>> lea esi, [rsi-1]
>>>>>> jg short loc_100000D50
>>>>>> jmp loc_100000F0E
>>>>>> ; ---------------------------------------------------------------------------
>>>>>>
>>>>>> loc_100000D71: ; CODE XREF: _main+C9 j
>>>>>> test esi, esi
>>>>>> js loc_100000F0E
>>>>>> mov dword ptr [rax+r8*4], 0
>>>>>> xor r11d, r11d
>>>>>>
>>>>>> loc_100000D84: ; CODE XREF: _main+BA j
>>>>>> cmp r15d, 1
>>>>>> mov esi, 0
>>>>>> mov r9, [rsp+48h+var_48]
>>>>>> mov r12d, 1
>>>>>> jle short loc_100000DF0
>>>>>>
>>>>>> loc_100000D99: ; CODE XREF: _main+15E j
>>>>>> mov r10d, [rax+rsi*4]
>>>>>> mov ecx, 0FFFFFFFFh
>>>>>> mov edi, 1
>>>>>> mov r13, r9
>>>>>> nop word ptr [rax+rax+00h]
>>>>>>
>>>>>> loc_100000DB0: ; CODE XREF: _main+14F j
>>>>>> xor ebx, ebx
>>>>>> mov ebp, r10d
>>>>>> sub ebp, [r13+0]
>>>>>> jz loc_100000D30
>>>>>> cmp ecx, ebp
>>>>>> jz loc_100000D30
>>>>>> cmp edi, ebp
>>>>>> jz loc_100000D30
>>>>>> add r13, 4
>>>>>> inc rdi
>>>>>> dec ecx
>>>>>> mov ebx, edi
>>>>>> add ebx, esi
>>>>>> cmp ebx, r15d
>>>>>> jl short loc_100000DB0
>>>>>> inc r12
>>>>>> add r9, 4
>>>>>> inc rsi
>>>>>> cmp r12d, r15d
>>>>>> jl short loc_100000D99
>>>>>>
>>>>>> loc_100000DF0: ; CODE XREF: _main+107 j
>>>>>> mov ebx, 1
>>>>>> jmp loc_100000D30
>>>>>> ; ---------------------------------------------------------------------------
>>>>>>
>>>>>> loc_100000DFA: ; CODE XREF: _main+5E j
>>>>>> mov ecx, [rax+r8*4]
>>>>>> lea r9d, [rcx+1]
>>>>>> mov [rax+r8*4], r9d
>>>>>> cmp ecx, r8d
>>>>>> jge loc_100000F0E
>>>>>> lea r12, [rax+4]
>>>>>> xor r14d, r14d
>>>>>> db 2Eh
>>>>>> nop word ptr [rax+rax+00000000h]
>>>>>>
>>>>>> loc_100000E20: ; CODE XREF: _main+216 j
>>>>>> test r15d, r15d
>>>>>> setle cl
>>>>>> cmp r15d, 2
>>>>>> jl short loc_100000E90
>>>>>> test cl, cl
>>>>>> mov r13d, 0
>>>>>> mov r11, r12
>>>>>> mov r10d, 1
>>>>>> jnz short loc_100000E90
>>>>>>
>>>>>> loc_100000E3F: ; CODE XREF: _main+1F0 j
>>>>>> mov edi, [rax+r13*4]
>>>>>> mov edx, 0FFFFFFFFh
>>>>>> mov ecx, 1
>>>>>> mov rsi, r11
>>>>>>
>>>>>> loc_100000E50: ; CODE XREF: _main+1E1 j
>>>>>> xor ebx, ebx
>>>>>> mov ebp, edi
>>>>>> sub ebp, [rsi]
>>>>>> jz short loc_100000E95
>>>>>> cmp edx, ebp
>>>>>> jz short loc_100000E95
>>>>>> cmp ecx, ebp
>>>>>> jz short loc_100000E95
>>>>>> add rsi, 4
>>>>>> inc rcx
>>>>>> dec edx
>>>>>> mov ebx, ecx
>>>>>> add ebx, r13d
>>>>>> cmp ebx, r15d
>>>>>> jl short loc_100000E50
>>>>>> inc r10
>>>>>> add r11, 4
>>>>>> inc r13
>>>>>> cmp r10d, r15d
>>>>>> jl short loc_100000E3F
>>>>>> db 66h, 66h, 66h, 66h, 2Eh
>>>>>> nop word ptr [rax+rax+00000000h]
>>>>>>
>>>>>> loc_100000E90: ; CODE XREF: _main+19A j
>>>>>> ; _main+1AD j
>>>>>> mov ebx, 1
>>>>>>
>>>>>> loc_100000E95: ; CODE XREF: _main+1C6 j
>>>>>> ; _main+1CA j ...
>>>>>> add r14d, ebx
>>>>>> cmp r9d, r8d
>>>>>> lea ecx, [r9+1]
>>>>>> mov [rax+r8*4], ecx
>>>>>> mov r9d, ecx
>>>>>> jl loc_100000E20
>>>>>> jmp short loc_100000F0E
>>>>>> ; ---------------------------------------------------------------------------
>>>>>>
>>>>>> loc_100000EAE: ; CODE XREF: _main+6B j
>>>>>> add r15d, 0FFFFFFFEh
>>>>>> movsxd rcx, r15d
>>>>>> lea rcx, [rax+rcx*4]
>>>>>> xor r14d, r14d
>>>>>> jmp short loc_100000EC6
>>>>>> ; ---------------------------------------------------------------------------
>>>>>> align 20h
>>>>>>
>>>>>> loc_100000EC0: ; CODE XREF: _main+247 j
>>>>>> ; _main+27C j
>>>>>> inc r14d
>>>>>> mov r11d, ebp
>>>>>>
>>>>>> loc_100000EC6: ; CODE XREF: _main+22C j
>>>>>> lea ebp, [r11+1]
>>>>>> mov [rax+r8*4], ebp
>>>>>> cmp r11d, r8d
>>>>>> mov rsi, rcx
>>>>>> mov edi, r15d
>>>>>> jl short loc_100000EC0
>>>>>> nop dword ptr [rax+00000000h]
>>>>>>
>>>>>> loc_100000EE0: ; CODE XREF: _main+26A j
>>>>>> mov ebp, [rsi]
>>>>>> lea ebx, [rbp+1]
>>>>>> mov [rsi], ebx
>>>>>> cmp ebp, edx
>>>>>> jl short loc_100000EFE
>>>>>> mov dword ptr [rsi], 0
>>>>>> add rsi, 0FFFFFFFFFFFFFFFCh
>>>>>> test edi, edi
>>>>>> lea edi, [rdi-1]
>>>>>> jg short loc_100000EE0
>>>>>> jmp short loc_100000F0E
>>>>>> ; ---------------------------------------------------------------------------
>>>>>>
>>>>>> loc_100000EFE: ; CODE XREF: _main+259 j
>>>>>> test edi, edi
>>>>>> js short loc_100000F0E
>>>>>> mov dword ptr [rax+r8*4], 0
>>>>>> xor ebp, ebp
>>>>>> jmp short loc_100000EC0
>>>>>> ; ---------------------------------------------------------------------------
>>>>>>
>>>>>> loc_100000F0E: ; CODE XREF: _main+DC j
>>>>>> ; _main+E3 j ...
>>>>>> mov rdi, rax ; void *
>>>>>> call _free
>>>>>> lea rdi, aDSolutions ; "%d solutions\n"
>>>>>> xor ebx, ebx
>>>>>> xor eax, eax
>>>>>> mov esi, r14d
>>>>>> call _printf
>>>>>>
>>>>>> loc_100000F29: ; CODE XREF: _main+16 j
>>>>>> mov eax, ebx
>>>>>> add rsp, 18h
>>>>>> pop rbx
>>>>>> pop r12
>>>>>> pop r13
>>>>>> pop r14
>>>>>> pop r15
>>>>>> pop rbp
>>>>>> retn
>>>>>> _main endp
>>>>>> ```
>>>>>>
>>>>>> clang 3.6's result:
>>>>>>
>>>>>> ```
>>>>>> public _main
>>>>>> _main proc near
>>>>>>
>>>>>> var_60 = qword ptr -60h
>>>>>> var_58 = qword ptr -58h
>>>>>> var_50 = qword ptr -50h
>>>>>> var_48 = qword ptr -48h
>>>>>> var_40 = qword ptr -40h
>>>>>> var_38 = qword ptr -38h
>>>>>>
>>>>>> push rbp
>>>>>> push r15
>>>>>> push r14
>>>>>> push r13
>>>>>> push r12
>>>>>> push rbx
>>>>>> sub rsp, 38h
>>>>>> mov ebx, 0FFFFFFFFh
>>>>>> cmp edi, 2
>>>>>> jnz loc_100000F23
>>>>>> mov rbx, offset __mh_execute_header
>>>>>> mov rdi, [rsi+8] ; char *
>>>>>> xor r13d, r13d
>>>>>> xor esi, esi ; char **
>>>>>> mov edx, 0Ah ; int
>>>>>> call _strtol
>>>>>> mov r14, rax
>>>>>> shl rax, 20h
>>>>>> mov [rsp+68h+var_38], rax
>>>>>> lea rsi, [rax+rbx]
>>>>>> sar rsi, 20h ; size_t
>>>>>> mov edi, 4 ; size_t
>>>>>> call _calloc
>>>>>> lea r11d, [r14-1]
>>>>>> movsxd r12, r11d
>>>>>> mov [rsp+68h+var_40], r12
>>>>>> movsxd rcx, r14d
>>>>>> mov [rsp+68h+var_50], rcx
>>>>>> add ecx, 0FFFFFFFEh
>>>>>> js loc_100000E1A
>>>>>> mov ecx, r14d
>>>>>> add ecx, 0FFFFFFFEh
>>>>>> movsxd rcx, ecx
>>>>>> inc rcx
>>>>>> mov [rsp+68h+var_58], rcx
>>>>>> mov rcx, rax
>>>>>> add rcx, 4
>>>>>> mov [rsp+68h+var_60], rcx
>>>>>> xor ebp, ebp
>>>>>> jmp short loc_100000D17
>>>>>> ; ---------------------------------------------------------------------------
>>>>>> align 10h
>>>>>>
>>>>>> loc_100000D10: ; CODE XREF: _main+15B j
>>>>>> ; _main+163 j ...
>>>>>> mov rbp, [rsp+68h+var_48]
>>>>>> add ebp, edi
>>>>>>
>>>>>> loc_100000D17: ; CODE XREF: _main+93 j
>>>>>> cmp r13d, r11d
>>>>>> lea edx, [r13+1]
>>>>>> mov [rax+r12*4], edx
>>>>>> mov rcx, [rsp+68h+var_58]
>>>>>> mov r13d, edx
>>>>>> jl short loc_100000D6B
>>>>>> nop dword ptr [rax+00h]
>>>>>>
>>>>>> loc_100000D30: ; CODE XREF: _main+DE j
>>>>>> mov edx, [rax+rcx*4-4]
>>>>>> lea esi, [rdx+1]
>>>>>> mov [rax+rcx*4-4], esi
>>>>>> cmp edx, r11d
>>>>>> jl short loc_100000D60
>>>>>> mov dword ptr [rax+rcx*4-4], 0
>>>>>> dec rcx
>>>>>> test rcx, rcx
>>>>>> jg short loc_100000D30
>>>>>> jmp loc_100000F09
>>>>>> ; ---------------------------------------------------------------------------
>>>>>> align 20h
>>>>>>
>>>>>> loc_100000D60: ; CODE XREF: _main+CE j
>>>>>> mov dword ptr [rax+r12*4], 0
>>>>>> xor r13d, r13d
>>>>>>
>>>>>> loc_100000D6B: ; CODE XREF: _main+BA j
>>>>>> mov [rsp+68h+var_48], rbp
>>>>>> test r14d, r14d
>>>>>> setle cl
>>>>>> mov rdx, offset __mh_execute_header
>>>>>> lea rdx, [rdx+1]
>>>>>> cmp [rsp+68h+var_38], rdx
>>>>>> jl loc_100000E10
>>>>>> test cl, cl
>>>>>> mov edx, 0
>>>>>> mov r10, [rsp+68h+var_60]
>>>>>> mov r9d, 1
>>>>>> jnz short loc_100000E10
>>>>>>
>>>>>> loc_100000DA3: ; CODE XREF: _main+195 j
>>>>>> mov esi, [rax+rdx*4]
>>>>>> mov r15d, 0FFFFFFFFh
>>>>>> mov r8d, 1
>>>>>> mov rcx, r10
>>>>>> db 66h, 66h, 2Eh
>>>>>> nop dword ptr [rax+rax+00000000h]
>>>>>>
>>>>>> loc_100000DC0: ; CODE XREF: _main+184 j
>>>>>> mov ebx, [rcx]
>>>>>> mov ebp, esi
>>>>>> sub ebp, ebx
>>>>>> xor edi, edi
>>>>>> cmp r8d, ebp
>>>>>> jz loc_100000D10
>>>>>> cmp esi, ebx
>>>>>> jz loc_100000D10
>>>>>> cmp r15d, ebp
>>>>>> jz loc_100000D10
>>>>>> add rcx, 4
>>>>>> inc r8
>>>>>> dec r15d
>>>>>> mov edi, r8d
>>>>>> add edi, edx
>>>>>> cmp edi, r14d
>>>>>> jl short loc_100000DC0
>>>>>> inc r9
>>>>>> add r10, 4
>>>>>> inc rdx
>>>>>> cmp r9, [rsp+68h+var_50]
>>>>>> jl short loc_100000DA3
>>>>>> nop word ptr [rax+rax+00000000h]
>>>>>>
>>>>>> loc_100000E10: ; CODE XREF: _main+119 j
>>>>>> ; _main+131 j
>>>>>> mov edi, 1
>>>>>> jmp loc_100000D10
>>>>>> ; ---------------------------------------------------------------------------
>>>>>>
>>>>>> loc_100000E1A: ; CODE XREF: _main+6E j
>>>>>> test r14d, r14d
>>>>>> jle loc_100000F00
>>>>>> mov dword ptr [rax+r12*4], 1
>>>>>> xor ebp, ebp
>>>>>> cmp r14d, 2
>>>>>> jl loc_100000F09
>>>>>> mov rcx, rax
>>>>>> add rcx, 4
>>>>>> mov [rsp+68h+var_48], rcx
>>>>>> xor ebp, ebp
>>>>>> mov r15d, 1
>>>>>> nop dword ptr [rax+rax+00h]
>>>>>>
>>>>>> loc_100000E50: ; CODE XREF: _main+288 j
>>>>>> mov rbx, rbp
>>>>>> mov rcx, offset __mh_execute_header
>>>>>> cmp [rsp+68h+var_38], rcx
>>>>>> mov edx, 0
>>>>>> mov r13, [rsp+68h+var_48]
>>>>>> mov r8d, 1
>>>>>> mov r9d, 1
>>>>>> jle short loc_100000EE0
>>>>>>
>>>>>> loc_100000E7A: ; CODE XREF: _main+25A j
>>>>>> mov r12d, [rax+rdx*4]
>>>>>> mov edi, 0FFFFFFFFh
>>>>>> mov ecx, 1
>>>>>> mov rsi, r13
>>>>>> nop dword ptr [rax+rax+00h]
>>>>>>
>>>>>> loc_100000E90: ; CODE XREF: _main+249 j
>>>>>> mov r10d, [rsi]
>>>>>> mov ebp, r12d
>>>>>> sub ebp, r10d
>>>>>> xor r9d, r9d
>>>>>> cmp ecx, ebp
>>>>>> jz short loc_100000EE0
>>>>>> cmp r12d, r10d
>>>>>> jz short loc_100000EE0
>>>>>> cmp edi, ebp
>>>>>> jz short loc_100000EE0
>>>>>> add rsi, 4
>>>>>> inc rcx
>>>>>> dec edi
>>>>>> mov ebp, ecx
>>>>>> add ebp, edx
>>>>>> cmp ebp, r14d
>>>>>> jl short loc_100000E90
>>>>>> inc r8
>>>>>> add r13, 4
>>>>>> inc rdx
>>>>>> cmp r8, [rsp+68h+var_50]
>>>>>> jl short loc_100000E7A
>>>>>> mov r9d, 1
>>>>>> db 66h, 66h, 66h, 66h, 2Eh
>>>>>> nop word ptr [rax+rax+00000000h]
>>>>>>
>>>>>> loc_100000EE0: ; CODE XREF: _main+208 j
>>>>>> ; _main+22E j ...
>>>>>> mov rbp, rbx
>>>>>> add ebp, r9d
>>>>>> cmp r15d, r11d
>>>>>> lea ecx, [r15+1]
>>>>>> mov rdx, [rsp+68h+var_40]
>>>>>> mov [rax+rdx*4], ecx
>>>>>> mov r15d, ecx
>>>>>> jl loc_100000E50
>>>>>> jmp short loc_100000F09
>>>>>> ; ---------------------------------------------------------------------------
>>>>>>
>>>>>> loc_100000F00: ; CODE XREF: _main+1AD j
>>>>>> xor ebp, ebp
>>>>>> test r11d, r11d
>>>>>> cmovns ebp, r11d
>>>>>>
>>>>>> loc_100000F09: ; CODE XREF: _main+E0 j
>>>>>> ; _main+1C1 j ...
>>>>>> mov rdi, rax ; void *
>>>>>> call _free
>>>>>> lea rdi, aDSolutions ; "%d solutions\n"
>>>>>> xor ebx, ebx
>>>>>> xor eax, eax
>>>>>> mov esi, ebp
>>>>>> call _printf
>>>>>>
>>>>>> loc_100000F23: ; CODE XREF: _main+16 j
>>>>>> mov eax, ebx
>>>>>> add rsp, 38h
>>>>>> pop rbx
>>>>>> pop r12
>>>>>> pop r13
>>>>>> pop r14
>>>>>> pop r15
>>>>>> pop rbp
>>>>>> retn
>>>>>> _main endp
>>>>>> ```
>>>>>>
>>>>>> gcc-4.9.2's result:
>>>>>> ```
>>>>>>
>>>>>> _main proc near
>>>>>>
>>>>>> var_48 = qword ptr -48h
>>>>>> var_40 = dword ptr -40h
>>>>>> var_3C = dword ptr -3Ch
>>>>>>
>>>>>> cmp edi, 2
>>>>>> jz short loc_100000D69
>>>>>> or eax, 0FFFFFFFFh
>>>>>> retn
>>>>>> ; ---------------------------------------------------------------------------
>>>>>>
>>>>>> loc_100000D69: ; CODE XREF: _main+3 j
>>>>>> push r15
>>>>>> mov edx, 0Ah ; int
>>>>>> push r14
>>>>>> push r13
>>>>>> push r12
>>>>>> push rbp
>>>>>> push rbx
>>>>>> sub rsp, 18h
>>>>>> mov rdi, [rsi+8] ; char *
>>>>>> xor esi, esi ; char **
>>>>>> call _strtol
>>>>>> mov edi, 4 ; size_t
>>>>>> lea esi, [rax+1]
>>>>>> mov r14, rax
>>>>>> mov ebx, eax
>>>>>> lea r15d, [r14-2]
>>>>>> movsxd rsi, esi ; size_t
>>>>>> call _calloc
>>>>>> mov [rsp+48h+var_3C], 0
>>>>>> mov rdi, rax ; void *
>>>>>> lea eax, [r14-1]
>>>>>> cdqe
>>>>>> lea r13, [rdi+rax*4]
>>>>>> movsxd rax, r15d
>>>>>> mov ebp, [r13+0]
>>>>>> shl rax, 2
>>>>>> lea r12, [rdi+rax]
>>>>>> lea rax, [rdi+rax-4]
>>>>>> mov [rsp+48h+var_48], rax
>>>>>> mov eax, r14d
>>>>>> lea r14d, [r14+1]
>>>>>> nop word ptr [rax+rax+00h]
>>>>>> nop word ptr [rax+rax+00h]
>>>>>>
>>>>>> loc_100000DE0: ; CODE XREF: _main+12B j
>>>>>> ; _main+155 j ...
>>>>>> add ebp, 1
>>>>>> cmp ebx, ebp
>>>>>> mov [r13+0], ebp
>>>>>> jg short loc_100000E62
>>>>>> test r15d, r15d
>>>>>> js short loc_100000E33
>>>>>> mov ecx, [r12]
>>>>>> lea edx, [rcx+1]
>>>>>> cmp ebx, edx
>>>>>> mov [r12], edx
>>>>>> jg short loc_100000E58
>>>>>> mov r8, r12
>>>>>> mov rcx, [rsp+48h+var_48]
>>>>>> mov esi, r15d
>>>>>> jmp short loc_100000E24
>>>>>> ; ---------------------------------------------------------------------------
>>>>>> align 10h
>>>>>>
>>>>>> loc_100000E10: ; CODE XREF: _main+D1 j
>>>>>> mov edx, [rcx]
>>>>>> sub r8, 4
>>>>>> sub rcx, 4
>>>>>> add edx, 1
>>>>>> mov [rcx+4], edx
>>>>>> cmp ebx, edx
>>>>>> jg short loc_100000E58
>>>>>>
>>>>>> loc_100000E24: ; CODE XREF: _main+A9 j
>>>>>> sub esi, 1
>>>>>> mov dword ptr [r8], 0
>>>>>> cmp esi, 0FFFFFFFFh
>>>>>> jnz short loc_100000E10
>>>>>>
>>>>>> loc_100000E33: ; CODE XREF: _main+8E j
>>>>>> call _free
>>>>>> mov esi, [rsp+48h+var_3C]
>>>>>> add rsp, 18h
>>>>>> xor eax, eax
>>>>>> pop rbx
>>>>>> lea rdi, aDSolutions ; "%d solutions\n"
>>>>>> pop rbp
>>>>>> pop r12
>>>>>> pop r13
>>>>>> pop r14
>>>>>> pop r15
>>>>>> jmp _printf
>>>>>> ; ---------------------------------------------------------------------------
>>>>>>
>>>>>> loc_100000E58: ; CODE XREF: _main+9D j
>>>>>> ; _main+C2 j
>>>>>> mov dword ptr [r13+0], 0
>>>>>> xor ebp, ebp
>>>>>>
>>>>>> loc_100000E62: ; CODE XREF: _main+89 j
>>>>>> test ebx, ebx
>>>>>> jle loc_100000EE6
>>>>>> lea r11, [rdi+8]
>>>>>> xor r10d, r10d
>>>>>>
>>>>>> loc_100000E71: ; CODE XREF: _main+184 j
>>>>>> add r10d, 1
>>>>>> cmp r10d, eax
>>>>>> jz short loc_100000EE6
>>>>>> mov r8d, [r11-8]
>>>>>> mov edx, r8d
>>>>>> sub edx, [r11-4]
>>>>>> add edx, 1
>>>>>> cmp edx, 2
>>>>>> jbe loc_100000DE0
>>>>>> mov r9d, r14d
>>>>>> mov rcx, r11
>>>>>> mov edx, 1
>>>>>> mov [rsp+48h+var_40], r10d
>>>>>> sub r9d, r10d
>>>>>> jmp short loc_100000ED3
>>>>>> ; ---------------------------------------------------------------------------
>>>>>> align 10h
>>>>>>
>>>>>> loc_100000EB0: ; CODE XREF: _main+179 j
>>>>>> mov esi, r8d
>>>>>> sub esi, [rcx]
>>>>>> jz loc_100000DE0
>>>>>> mov r10d, esi
>>>>>> add rcx, 4
>>>>>> add r10d, edx
>>>>>> jz loc_100000DE0
>>>>>> cmp esi, edx
>>>>>> jz loc_100000DE0
>>>>>>
>>>>>> loc_100000ED3: ; CODE XREF: _main+144 j
>>>>>> add edx, 1
>>>>>> cmp edx, r9d
>>>>>> jnz short loc_100000EB0
>>>>>> mov r10d, [rsp+48h+var_40]
>>>>>> add r11, 4
>>>>>> jmp short loc_100000E71
>>>>>> ; ---------------------------------------------------------------------------
>>>>>>
>>>>>> loc_100000EE6: ; CODE XREF: _main+104 j
>>>>>> ; _main+118 j
>>>>>> add [rsp+48h+var_3C], 1
>>>>>> jmp loc_100000DE0
>>>>>> _main endp
>>>>>> ```
>>>>>>
>>>>>> MSVC 10.0's result:
>>>>>>
>>>>>> ```
>>>>>>
>>>>>> _main proc near ; CODE XREF: ___tmainCRTStartup+106 p
>>>>>>
>>>>>> var_80 = dword ptr -80h
>>>>>> var_7C = dword ptr -7Ch
>>>>>> var_78 = dword ptr -78h
>>>>>> var_74 = dword ptr -74h
>>>>>> var_70 = dword ptr -70h
>>>>>> var_6C = dword ptr -6Ch
>>>>>> var_68 = dword ptr -68h
>>>>>> var_64 = dword ptr -64h
>>>>>> var_60 = dword ptr -60h
>>>>>> var_5C = dword ptr -5Ch
>>>>>> argc = dword ptr 8
>>>>>> argv = dword ptr 0Ch
>>>>>> envp = dword ptr 10h
>>>>>>
>>>>>> push ebp
>>>>>> mov ebp, esp
>>>>>> and esp, 0FFFFFF80h
>>>>>> push esi
>>>>>> push edi
>>>>>> push ebx
>>>>>> sub esp, 74h
>>>>>> push 3
>>>>>> call sub_4080F0
>>>>>> add esp, 4
>>>>>> stmxcsr [esp+80h+var_80]
>>>>>> or [esp+80h+var_80], 8000h
>>>>>> ldmxcsr [esp+80h+var_80]
>>>>>> cmp [ebp+argc], 2
>>>>>> jz short loc_40103A
>>>>>> mov eax, 0FFFFFFFFh
>>>>>> add esp, 74h
>>>>>> pop ebx
>>>>>> pop edi
>>>>>> pop esi
>>>>>> mov esp, ebp
>>>>>> pop ebp
>>>>>> retn
>>>>>> ; ---------------------------------------------------------------------------
>>>>>>
>>>>>> loc_40103A: ; CODE XREF: _main+29 j
>>>>>> call ds:GetTickCount
>>>>>> mov esi, eax
>>>>>> mov eax, [ebp+argv]
>>>>>> push dword ptr [eax+4] ; char *
>>>>>> call _atoi
>>>>>> mov edi, eax
>>>>>> lea eax, [edi+1]
>>>>>> push eax ; size_t
>>>>>> push 4 ; size_t
>>>>>> call _calloc
>>>>>> add esp, 0Ch
>>>>>> mov ecx, [eax+edi*4-4]
>>>>>> lea edx, [edi-1]
>>>>>> mov [esp+80h+var_6C], ecx
>>>>>> xor ebx, ebx
>>>>>> mov [esp+80h+var_7C], ebx
>>>>>> lea ecx, [eax+edi*4]
>>>>>> mov [esp+80h+var_74], ecx
>>>>>> lea ecx, [edi-2]
>>>>>> mov [esp+80h+var_70], ecx
>>>>>> mov [esp+80h+var_60], edx
>>>>>> mov [esp+80h+var_80], esi
>>>>>> mov ecx, [esp+80h+var_6C]
>>>>>>
>>>>>> loc_401087: ; CODE XREF: _main+142 j
>>>>>> ; _main+193 j
>>>>>> mov edx, [esp+80h+var_60]
>>>>>> inc ecx
>>>>>> mov [eax+edi*4-4], ecx
>>>>>> cmp edi, [eax+edx*4]
>>>>>> jg short loc_4010DC
>>>>>> mov esi, [esp+80h+var_70]
>>>>>> test esi, esi
>>>>>> js short loc_4010CE
>>>>>> xor edx, edx
>>>>>> mov [esp+80h+var_78], eax
>>>>>> xor ebx, ebx
>>>>>> mov eax, [esp+80h+var_74]
>>>>>>
>>>>>> loc_4010A9: ; CODE XREF: _main+C8 j
>>>>>> mov ecx, [eax+ebx*4-8]
>>>>>> inc ecx
>>>>>> cmp ecx, edi
>>>>>> jl loc_40117A
>>>>>> inc edx
>>>>>> lea esi, [ebx+edi-3]
>>>>>> mov dword ptr [eax+ebx*4-8], 0
>>>>>> dec ebx
>>>>>> cmp edx, [esp+80h+var_60]
>>>>>> jb short loc_4010A9
>>>>>> mov eax, [esp+80h+var_78]
>>>>>>
>>>>>> loc_4010CE: ; CODE XREF: _main+9B j
>>>>>> ; _main+186 j
>>>>>> test esi, esi
>>>>>> jl short loc_401147
>>>>>> mov dword ptr [eax+edi*4-4], 0
>>>>>> xor ecx, ecx
>>>>>>
>>>>>> loc_4010DC: ; CODE XREF: _main+93 j
>>>>>> test edi, edi
>>>>>> jle short loc_40113E
>>>>>> mov [esp+80h+var_6C], ecx
>>>>>> xor edx, edx
>>>>>> mov [esp+80h+var_5C], edi
>>>>>>
>>>>>> loc_4010EA: ; CODE XREF: _main+132 j
>>>>>> lea ecx, [edx+1]
>>>>>> mov ebx, ecx
>>>>>> mov esi, ebx
>>>>>> cmp ecx, [esp+80h+var_5C]
>>>>>> jge short loc_401130
>>>>>> mov edx, [eax+edx*4]
>>>>>> mov edi, 1
>>>>>> mov [esp+80h+var_64], esi
>>>>>> mov [esp+80h+var_68], ecx
>>>>>>
>>>>>> loc_401107: ; CODE XREF: _main+122 j
>>>>>> mov esi, [eax+ebx*4]
>>>>>> cmp edx, esi
>>>>>> jz short loc_40118B
>>>>>> sub esi, edx
>>>>>> mov ecx, esi
>>>>>> neg ecx
>>>>>> cmp edi, ecx
>>>>>> jz short loc_40118B
>>>>>> cmp esi, edi
>>>>>> jz short loc_40118B
>>>>>> inc ebx
>>>>>> inc edi
>>>>>> cmp ebx, [esp+80h+var_5C]
>>>>>> jl short loc_401107
>>>>>> mov ecx, [esp+80h+var_68]
>>>>>> mov esi, [esp+80h+var_64]
>>>>>> cmp ecx, [esp+80h+var_5C]
>>>>>>
>>>>>> loc_401130: ; CODE XREF: _main+F5 j
>>>>>> mov edx, esi
>>>>>> jl short loc_4010EA
>>>>>> xchg ax, ax
>>>>>> mov ecx, [esp+80h+var_6C]
>>>>>> mov edi, [esp+80h+var_5C]
>>>>>>
>>>>>> loc_40113E: ; CODE XREF: _main+DE j
>>>>>> inc [esp+80h+var_7C]
>>>>>> jmp loc_401087
>>>>>> ; ---------------------------------------------------------------------------
>>>>>>
>>>>>> loc_401147: ; CODE XREF: _main+D0 j
>>>>>> mov ebx, [esp+80h+var_7C]
>>>>>> mov esi, [esp+80h+var_80]
>>>>>> push eax ; void *
>>>>>> call _free
>>>>>> add esp, 4
>>>>>> call ds:GetTickCount
>>>>>> sub eax, esi
>>>>>> push eax
>>>>>> push ebx
>>>>>> push offset aDSolutionsInDM ; "%d solutions in %d msecs.\n"
>>>>>> call _printf
>>>>>> xor eax, eax
>>>>>> add esp, 80h
>>>>>> pop ebx
>>>>>> pop edi
>>>>>> pop esi
>>>>>> mov esp, ebp
>>>>>> pop ebp
>>>>>> retn
>>>>>> ; ---------------------------------------------------------------------------
>>>>>>
>>>>>> loc_40117A: ; CODE XREF: _main+B0 j
>>>>>> mov edx, [esp+80h+var_74]
>>>>>> mov eax, [esp+80h+var_78]
>>>>>> mov [edx+ebx*4-8], ecx
>>>>>> jmp loc_4010CE
>>>>>> ; ---------------------------------------------------------------------------
>>>>>>
>>>>>> loc_40118B: ; CODE XREF: _main+10C j
>>>>>> ; _main+116 j ...
>>>>>> mov ecx, [esp+80h+var_6C]
>>>>>> mov edi, [esp+80h+var_5C]
>>>>>> jmp loc_401087
>>>>>> _main endp
>>>>>> ```
>>>>>> _______________________________________________
>>>>>> LLVM Developers mailing list
>>>>>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu
>>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
More information about the llvm-dev
mailing list