[llvm-commits] [llvm] r120932 - in /llvm/trunk: lib/Target/X86/X86ISelLowering.cpp test/CodeGen/X86/select.ll
Evan Cheng
evan.cheng at apple.com
Sun Dec 5 15:32:03 PST 2010
On Dec 5, 2010, at 2:52 PM, Chris Lattner wrote:
>
> On Dec 5, 2010, at 2:22 PM, Evan Cheng wrote:
>
>>>
>>> __Z4funcl: ## @_Z4funcl
>>> movl $4, %ecx ## encoding: [0xb9,0x04,0x00,0x00,0x00]
>>> movq %rdi, %rax ## encoding: [0x48,0x89,0xf8]
>>> mulq %rcx ## encoding: [0x48,0xf7,0xe1]
>>> cmpq $1, %rdx ## encoding: [0x48,0x83,0xfa,0x01]
>>> sbbq %rdi, %rdi ## encoding: [0x48,0x19,0xff]
>>> notq %rdi ## encoding: [0x48,0xf7,0xd7]
>>> orq %rax, %rdi ## encoding: [0x48,0x09,0xc7]
>>> jmp __Znam ## TAILCALL
>>> ## encoding: [0xeb,A]
>>
>> Why is this an improvement?
>
> cmov is high latency and generally bad,
You are showing your age. :-) That's true for P4, but not since then.
> but this also helps if the other value is an imm or load, because those can be folded into orq but not into cmov.
That may be true. I'd still call this optimization suspect. A couple of cycles of latencies means very little for modern x86. Instruction throughput is probably just as important and the extra instruction here may actually be an pessimisation.
Both gcc and icc just generates something like this. What are we missing? It doesn't seem like they are concerned with overflow?
salq $2, %rdi
jmp __Znam
Evan
>
> -Chris
More information about the llvm-commits
mailing list