[llvm-commits] [llvm] r124526 - in /llvm/trunk: include/llvm/Transforms/Utils/ lib/CodeGen/ lib/Transforms/Scalar/ lib/Transforms/Utils/ test/CodeGen/X86/ test/Transforms/JumpThreading/ test/Transforms/SimplifyCFG/

Wed Feb 2 10:51:52 PST 2011

On Feb 2, 2011, at 10:30 AM, Jakob Stoklund Olesen wrote:

> 
> On Feb 2, 2011, at 10:17 AM, Evan Cheng wrote:
> 
>> 
>> On Feb 2, 2011, at 9:17 AM, Bob Wilson wrote:
>> 
>>> 
>>> On Jan 28, 2011, at 8:46 PM, Evan Cheng wrote:
>>>> Modified: llvm/trunk/lib/CodeGen/TailDuplication.cpp
>>>> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/CodeGen/TailDuplication.cpp?rev=124526&r1=124525&r2=124526&view=diff
>>>> ==============================================================================
>>>> --- llvm/trunk/lib/CodeGen/TailDuplication.cpp (original)
>>>> +++ llvm/trunk/lib/CodeGen/TailDuplication.cpp Fri Jan 28 22:46:23 2011
>>>> @@ -502,7 +505,7 @@
>>>> }
>>>> // Heuristically, don't tail-duplicate calls if it would expand code size,
>>>> // as it's less likely to be worth the extra cost.
>>>> -  if (InstrCount > 1 && HasCall)
>>>> +  if (InstrCount > 1 && (PreRegAlloc && HasCall))
>>> 
>>> What is this change for?  The other tail duplication change in this patch enables the pre-RA pass to handle returns, and I understand that part of it.  This part changes the post-RA tail dup pass to be more aggressive duplicating blocks with calls, and I don't see the connection between that and the rest of your patch.  If there's a good reason for this, could you add a comment to explain?
>> 
>> I was looking at a test case and didn't quite understand why calls are not being duplicated. I assume the reason is that call is often a register allocation barrier. Before register allocation, there is no way to know whether the call would cause register spills around it.
> 
> After register allocation, a call still takes a long time to execute, so the relative benefit of duplicating the block is quite small.

I don't get this argument. The call is going to be executed either way, we are only concerned about whether getting rid of the unconditional branch is worth the code size increase. Isn't a  branch + call, which is just a branch to a branch, bad? The case I really want to get is duplicating a tail call into the predecessor.

See test/Transforms/SimplifyCFG/MagicPointer.ll.

The ideal codegen for this should be:
_f:                                     ## @f
## BB#0:                                ## %entry
        cmpq    $4, %rdi
        ja      LBB0_6
## BB#1:                                ## %entry
        leaq    LJTI0_0(%rip), %rax
        movslq  (%rax,%rdi,4), %rcx
        addq    %rax, %rcx
        jmpq    *%rcx
LBB0_2:                                 ## %if.then
        leaq    L_.str(%rip), %rdi
        jmp     _puts                   ## TAILCALL
LBB0_3:                                 ## %if.then2
        leaq    L_.str1(%rip), %rdi
        jmp     _puts                   ## TAILCALL
LBB0_4:                                 ## %if.then9
        leaq    L_.str2(%rip), %rdi
        jmp     _puts                   ## TAILCALL
LBB0_5:                                 ## %if.then14
        leaq    L_.str3(%rip), %rdi
LBB0_6:                                 ## %if.else16
        jmp     _puts

But this require duplicating the call earlier.

> 
> Did this help a test case?

MagicPointer.ll is the motivating test case. We don't have a solution yet. rdar://8928086.

Evan

> 
> /jakob
>