[llvm-commits] [llvm] r120932 - in /llvm/trunk: lib/Target/X86/X86ISelLowering.cpp test/CodeGen/X86/select.ll

Sun Dec 5 15:32:03 PST 2010

On Dec 5, 2010, at 2:52 PM, Chris Lattner wrote:

> 
> On Dec 5, 2010, at 2:22 PM, Evan Cheng wrote:
> 
>>> 
>>> __Z4funcl:                              ## @_Z4funcl
>>> 	movl	$4, %ecx                ## encoding: [0xb9,0x04,0x00,0x00,0x00]
>>> 	movq	%rdi, %rax              ## encoding: [0x48,0x89,0xf8]
>>> 	mulq	%rcx                    ## encoding: [0x48,0xf7,0xe1]
>>> 	cmpq	$1, %rdx                ## encoding: [0x48,0x83,0xfa,0x01]
>>> 	sbbq	%rdi, %rdi              ## encoding: [0x48,0x19,0xff]
>>> 	notq	%rdi                    ## encoding: [0x48,0xf7,0xd7]
>>> 	orq	%rax, %rdi              ## encoding: [0x48,0x09,0xc7]
>>> 	jmp	__Znam                  ## TAILCALL
>>>                                      ## encoding: [0xeb,A]
>> 
>> Why is this an improvement?
> 
> cmov is high latency and generally bad,

You are showing your age. :-) That's true for P4, but not since then.

> but this also helps if the other value is an imm or load, because those can be folded into orq but not into cmov.

That may be true. I'd still call this optimization suspect. A couple of cycles of latencies means very little for modern x86. Instruction throughput is probably just as important and the extra instruction here may actually be an pessimisation.

Both gcc and icc just generates something like this. What are we missing? It doesn't seem like they are concerned with overflow?
        salq    $2, %rdi
        jmp     __Znam

Evan

> 
> -Chris