[LLVMbugs] [Bug 3707] New: Inefficient loop codegen
bugzilla-daemon at cs.uiuc.edu
bugzilla-daemon at cs.uiuc.edu
Tue Mar 3 04:57:08 PST 2009
http://llvm.org/bugs/show_bug.cgi?id=3707
Summary: Inefficient loop codegen
Product: libraries
Version: trunk
Platform: PC
OS/Version: All
Status: NEW
Severity: normal
Priority: P2
Component: Backend: X86
AssignedTo: unassignedbugs at nondot.org
ReportedBy: jturner at minnow-lang.org
CC: llvmbugs at cs.uiuc.edu
Compiling this with the llvm-gcc toolchain (through opt and llc) :
include <stdio.h>
int main() {
int loop = 1000000000;
int timeout;
timeoutloop:
timeout = 2000;
/* asm("nop;"); */
loopto:
if (--timeout == 0) goto timeoutloop;
if (--loop != 0) goto loopto;
printf("Timeout: %i\n", timeout);
return 0;
}
Yields this asm as output (I'm using OS X):
.text
.align 4,0x90
.globl _main
_main:
subl $12, %esp
movl $1999, %eax
xorl %ecx, %ecx
movl $1999, %edx
.align 4,0x90
LBB1_1: ## loopto
cmpl $1, %eax
leal -1(%eax), %eax
cmove %edx, %eax
incl %ecx
cmpl $999999999, %ecx
jne LBB1_1 ## loopto
LBB1_2: ## bb1
movl %eax, 4(%esp)
movl $LC, (%esp)
call _printf
xorl %eax, %eax
addl $12, %esp
ret
.section __TEXT,__cstring,cstring_literals
LC: ## LC
.asciz "Timeout: %i\n"
.subsections_via_symbols
Which runs in 1.7s on this machine.
Uncommenting the 'asm("nop")' in the C code above instead yields this output:
.text
.align 4,0x90
.globl _main
_main:
subl $12, %esp
movl $1000000000, %eax
.align 4,0x90
LBB1_1: ## loopto.thread
movl %eax, %ecx
## InlineAsm Start
nop;
## InlineAsm End
movl $4294967295, %edx
jmp LBB1_3 ## bb
LBB1_2: ## loopto
decl %eax
incl %edx
cmpl $1998, %edx
je LBB1_1 ## loopto.thread
LBB1_3: ## bb
cmpl $1, %eax
jne LBB1_2 ## loopto
LBB1_4: ## bb1
subl %ecx, %eax
addl $1999, %eax
movl %eax, 4(%esp)
movl $LC, (%esp)
call _printf
xorl %eax, %eax
addl $12, %esp
ret
.section __TEXT,__cstring,cstring_literals
LC: ## LC
.asciz "Timeout: %i\n"
.subsections_via_symbols
Which runs in 1.0s.
The trivialized loop runs slower than the non-trivialized one. Evan Chang
points out on the LLVM mailing list:
"The main issue is incl updates the EFLAGS condition code register. But
llvm x86 isn't taking advantage of that. This is a known issue,
hopefully someone will find the time to implement before 2.6.
The second issue is the leal -1 can be turned (back) into a decl.
Combine that with the optimization previously described, it can
eliminate the first cmpl."
Another possibility is the use of cmove in this case is slower than a jz to a
branch that resets %eax. Modifying the original asm source above:
.text
.align 4,0x90
.globl _main
_main:
subl $12, %esp
movl $1999, %eax
xorl %ecx, %ecx
movl $1999, %edx
jmp LBB1_1
.align 4,0x90
LBB1_3:
movl %edx, %eax
jmp LBB1_4
LBB1_1: ## loopto
cmpl $1, %eax
leal -1(%eax), %eax
jz LBB1_3
LBB1_4:
incl %ecx
cmpl $999999999, %ecx
jnz LBB1_1 ## loopto
jmp LBB1_2
LBB1_2: ## bb1
movl %eax, 4(%esp)
movl $LC, (%esp)
call _printf
xorl %eax, %eax
addl $12, %esp
ret
.section __TEXT,__cstring,cstring_literals
LC: ## LC
.asciz "Timeout: %i\n"
.subsections_via_symbols
Which also runs in 1.0s.
--
Configure bugmail: http://llvm.org/bugs/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.
More information about the llvm-bugs
mailing list