[llvm-commits] [llvm] r124526 - in /llvm/trunk: include/llvm/Transforms/Utils/ lib/CodeGen/ lib/Transforms/Scalar/ lib/Transforms/Utils/ test/CodeGen/X86/ test/Transforms/JumpThreading/ test/Transforms/SimplifyCFG/

Wed Feb 2 11:14:15 PST 2011

On Feb 2, 2011, at 10:51 AM, Evan Cheng wrote:

> 
> On Feb 2, 2011, at 10:30 AM, Jakob Stoklund Olesen wrote:
> 
>> 
>> On Feb 2, 2011, at 10:17 AM, Evan Cheng wrote:
>> 
>>> 
>>> On Feb 2, 2011, at 9:17 AM, Bob Wilson wrote:
>>> 
>>>> 
>>>> On Jan 28, 2011, at 8:46 PM, Evan Cheng wrote:
>>>>> Modified: llvm/trunk/lib/CodeGen/TailDuplication.cpp
>>>>> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/CodeGen/TailDuplication.cpp?rev=124526&r1=124525&r2=124526&view=diff
>>>>> ==============================================================================
>>>>> --- llvm/trunk/lib/CodeGen/TailDuplication.cpp (original)
>>>>> +++ llvm/trunk/lib/CodeGen/TailDuplication.cpp Fri Jan 28 22:46:23 2011
>>>>> @@ -502,7 +505,7 @@
>>>>> }
>>>>> // Heuristically, don't tail-duplicate calls if it would expand code size,
>>>>> // as it's less likely to be worth the extra cost.
>>>>> -  if (InstrCount > 1 && HasCall)
>>>>> +  if (InstrCount > 1 && (PreRegAlloc && HasCall))
>>>> 
>>>> What is this change for?  The other tail duplication change in this patch enables the pre-RA pass to handle returns, and I understand that part of it.  This part changes the post-RA tail dup pass to be more aggressive duplicating blocks with calls, and I don't see the connection between that and the rest of your patch.  If there's a good reason for this, could you add a comment to explain?
>>> 
>>> I was looking at a test case and didn't quite understand why calls are not being duplicated. I assume the reason is that call is often a register allocation barrier. Before register allocation, there is no way to know whether the call would cause register spills around it.
>> 
>> After register allocation, a call still takes a long time to execute, so the relative benefit of duplicating the block is quite small.
> 
> I don't get this argument. The call is going to be executed either way, we are only concerned about whether getting rid of the unconditional branch is worth the code size increase. Isn't a  branch + call, which is just a branch to a branch, bad? The case I really want to get is duplicating a tail call into the predecessor.
> 
> See test/Transforms/SimplifyCFG/MagicPointer.ll.
> 
> The ideal codegen for this should be:
> _f:                                     ## @f
> ## BB#0:                                ## %entry
>        cmpq    $4, %rdi
>        ja      LBB0_6
> ## BB#1:                                ## %entry
>        leaq    LJTI0_0(%rip), %rax
>        movslq  (%rax,%rdi,4), %rcx
>        addq    %rax, %rcx
>        jmpq    *%rcx
> LBB0_2:                                 ## %if.then
>        leaq    L_.str(%rip), %rdi
>        jmp     _puts                   ## TAILCALL
> LBB0_3:                                 ## %if.then2
>        leaq    L_.str1(%rip), %rdi
>        jmp     _puts                   ## TAILCALL
> LBB0_4:                                 ## %if.then9
>        leaq    L_.str2(%rip), %rdi
>        jmp     _puts                   ## TAILCALL
> LBB0_5:                                 ## %if.then14
>        leaq    L_.str3(%rip), %rdi
> LBB0_6:                                 ## %if.else16
>        jmp     _puts
> 
> 
> But this require duplicating the call earlier.
> 
>> 
>> Did this help a test case?
> 
> MagicPointer.ll is the motivating test case. We don't have a solution yet. rdar://8928086.

I looked at the history of that code -- it goes all the way back to Dan's initial commit for tail duplication, so I doubt if anyone has ever looked at the effects of that heuristic in practice.  If your change helps a test case, then let's do it.  Could you update the comment, though?  What it says now is wrong, and it would be good to mention what you said about spills around calls in the pre-RA pass.