[LLVMbugs] [Bug 5022] New: setz+mozxb -> xor+setz optimization

Mon Sep 21 13:54:16 PDT 2009

http://llvm.org/bugs/show_bug.cgi?id=5022

           Summary: setz+mozxb -> xor+setz optimization
           Product: libraries
           Version: trunk
          Platform: PC
        OS/Version: Linux
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Backend: X86
        AssignedTo: unassignedbugs at nondot.org
        ReportedBy: edwintorok at gmail.com
                CC: llvmbugs at cs.uiuc.edu

This IR:
define internal fastcc i32 @bc0f0(i8*) nounwind {
  %2 = getelementptr inbounds i8* %0, i32 80      ; <i8*> [#uses=1]
  %3 = bitcast i8* %2 to i32**                    ; <i32**> [#uses=1]
  %4 = load i32** %3                              ; <i32*> [#uses=1]
  %5 = getelementptr inbounds i32* %4, i32 3      ; <i32*> [#uses=1]
  %6 = load i32* %5                               ; <i32> [#uses=1]
  %7 = icmp eq i32 %6, 2                          ; <i1> [#uses=1]
  %8 = zext i1 %7 to i32                          ; <i32> [#uses=1]
  ret i32 %8
}

Translates to this:

mov 0x50(%rdi), %rax
cmp $0x2, 0xc(%rax)
setz %al
movzxb %al, %eax      
ret

This is fine, but could be tuned a bit further:
Use some other register instead of rax in first mov (say rcx).
Then at the same time you could xor rax, rax to clear it.
Then a single setz %al would be enough, without the need for a movzxb.

According to http://gmplib.org/~tege/x86-timing.pdf
movz has timings: 1,2.5,1,2.5; 1,3,1,3
xor has timings: 1,2,1,2;1,3,1,3

So xor is both faster (on P4) and can be run in parallel with the mov/cmp,
avoiding one data dependency (setz -> movvz).

-- 
Configure bugmail: http://llvm.org/bugs/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.