[LLVMbugs] [Bug 5022] New: setz+mozxb -> xor+setz optimization
bugzilla-daemon at cs.uiuc.edu
bugzilla-daemon at cs.uiuc.edu
Mon Sep 21 13:54:16 PDT 2009
http://llvm.org/bugs/show_bug.cgi?id=5022
Summary: setz+mozxb -> xor+setz optimization
Product: libraries
Version: trunk
Platform: PC
OS/Version: Linux
Status: NEW
Severity: normal
Priority: P2
Component: Backend: X86
AssignedTo: unassignedbugs at nondot.org
ReportedBy: edwintorok at gmail.com
CC: llvmbugs at cs.uiuc.edu
This IR:
define internal fastcc i32 @bc0f0(i8*) nounwind {
%2 = getelementptr inbounds i8* %0, i32 80 ; <i8*> [#uses=1]
%3 = bitcast i8* %2 to i32** ; <i32**> [#uses=1]
%4 = load i32** %3 ; <i32*> [#uses=1]
%5 = getelementptr inbounds i32* %4, i32 3 ; <i32*> [#uses=1]
%6 = load i32* %5 ; <i32> [#uses=1]
%7 = icmp eq i32 %6, 2 ; <i1> [#uses=1]
%8 = zext i1 %7 to i32 ; <i32> [#uses=1]
ret i32 %8
}
Translates to this:
mov 0x50(%rdi), %rax
cmp $0x2, 0xc(%rax)
setz %al
movzxb %al, %eax
ret
This is fine, but could be tuned a bit further:
Use some other register instead of rax in first mov (say rcx).
Then at the same time you could xor rax, rax to clear it.
Then a single setz %al would be enough, without the need for a movzxb.
According to http://gmplib.org/~tege/x86-timing.pdf
movz has timings: 1,2.5,1,2.5; 1,3,1,3
xor has timings: 1,2,1,2;1,3,1,3
So xor is both faster (on P4) and can be run in parallel with the mov/cmp,
avoiding one data dependency (setz -> movvz).
--
Configure bugmail: http://llvm.org/bugs/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.
More information about the llvm-bugs
mailing list