[llvm] r242714 - [ARM] Refactor the prologue/epilogue emission to be more robust.

Mon Nov 30 11:14:00 PST 2015

Hi Tim and Jon,

> On Nov 30, 2015, at 10:51 AM, Jonathan Roelofs <jonathan at codesourcery.com> wrote:
> 
> 
> 
> On 11/30/15 10:42 AM, Quentin Colombet wrote:
>> Hi Jon,
>> 
>> 
>>> On Nov 25, 2015, at 11:25 AM, Jonathan Roelofs
>>> <jonathan at codesourcery.com <mailto:jonathan at codesourcery.com>> wrote:
>>> 
>>> 
>>> 
>>> On 7/20/15 3:42 PM, Quentin Colombet wrote:
>>>> Author: qcolombet
>>>> Date: Mon Jul 20 16:42:14 2015
>>>> New Revision: 242714
>>>> 
>>>> URL: http://llvm.org/viewvc/llvm-project?rev=242714&view=rev
>>>> Log:
>>>> [ARM] Refactor the prologue/epilogue emission to be more robust.
>>>> 
>>>> This is the first step toward supporting shrink-wrapping for this target.
>>>> 
>>>> The changes could be summarized by these items:
>>>> - Expand the tail-call return as part of the expand pseudo pass.
>>>> - Get rid of the assumptions that the epilogue is the exit block:
>>>>  * Do not assume which registers are free in the epilogue. (This
>>>> indirectly
>>>>    improve the lowering of the code for the segmented stacks, see
>>>> the test
>>>>    cases.)
>>>>  * Take into account that the basic block can be empty.
>>>> 
>>>> Related to <rdar://problem/20821730>
>>>> 
>>> 
>>> snip
>>> 
>>>> Modified: llvm/trunk/test/CodeGen/ARM/thumb1_return_sequence.ll
>>>> URL:http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/ARM/thumb1_return_sequence.ll?rev=242714&r1=242713&r2=242714&view=diff
>>>> ==============================================================================
>>>> --- llvm/trunk/test/CodeGen/ARM/thumb1_return_sequence.ll (original)
>>>> +++ llvm/trunk/test/CodeGen/ARM/thumb1_return_sequence.ll Mon Jul 20
>>>> 16:42:14 2015
>>>> @@ -23,11 +23,9 @@ entry:
>>>> ; --------
>>>> ; CHECK-V4T:         add sp,
>>>> ; CHECK-V4T-NEXT:    pop {[[SAVED]]}
>>>> -; CHECK-V4T-NEXT:    mov r12, r3
>>>> -; CHECK-V4T-NEXT:    pop {r3}
>>>> -; CHECK-V4T-NEXT:    mov lr, r3
>>>> -; CHECK-V4T-NEXT:    mov r3, r12
>>>> -; CHECK-V4T:         bx  lr
>>>> +; We do not have any SP update to insert so we can just optimize
>>>> +; the pop sequence.
>>>> +; CHECK-V4T-NEXT:    pop {pc}
>>> 
>>> This ^ is an invalid optimization. The awkward sequence there is
>>> necessary because:
>>> 
>>> Arch constraints:
>>>  * v4t can't change the thumb bit via a `pop { pc }` insruction
>>>  * v4t can only pop into lo registers and the pc
>>>      (and bit[0] of the value popped into pc must be 0 if pc is in
>>>       the list)
>>>  * v4t can't move from a lo register to a lo register
>>> 
>>> AAPCS constraints:
>>>  * r3 needs to be saved off in case it is used as part of the return
>>>      value.
>>>  * r12 is the only register we're allowed to clobber.
>> 
>> Maybe the test or check line is just poorly written because the code
>> that does this optimization is guarded under  STI.hasV5TOps(), i.e.,
>> this should not happen for v4t only arch.
> 
> The problem is in the changes under the `if (ArgRegsSaveSize || IsV4PopReturn)` guard:
> 
> https://github.com/llvm-mirror/llvm/commit/b2dab382ce01d602a757906160fccd9129c768f7#diff-5daa733fad8e67c69c0c5291dbb05ebbR392
> 
> which did have `STI.hasV4TOps() && !STI.hasV5TOps()` as part of its condition.  This was conservatively correct, but sub-optimal for v5t+.

Thanks for pointing that out.
I see the problem now.

I’ll push a patch shortly.

Cheers,
-Quentin
> 
>> 
>> Do you have an actual test case where this fails?
> 
> No, I don't. I can't find the original testcases that motivated r214928 either (other than the ones in the commit).
> 
> 
> Jon
> 
>> 
>> Thanks,
>> -Quentin
>> 
>>> 
>>>> ; CHECK-V5T:         pop {[[SAVED]], pc}
>>>> }
>>>> 
>>>> @@ -53,19 +51,19 @@ entry:
>>>> ; Epilogue
>>>> ; --------
>>>> ; CHECK-V4T:         pop {[[SAVED]]}
>>>> -; CHECK-V4T-NEXT:    mov r12, r3
>>>> -; CHECK-V4T-NEXT:    pop {r3}
>>>> +; CHECK-V4T-NEXT:    mov r12, [[POP_REG:r[0-7]]]
>>>> +; CHECK-V4T-NEXT:    pop {[[POP_REG]]}
>>>> ; CHECK-V4T-NEXT:    add sp,
>>>> -; CHECK-V4T-NEXT:    mov lr, r3
>>>> -; CHECK-V4T-NEXT:    mov r3, r12
>>>> +; CHECK-V4T-NEXT:    mov lr, [[POP_REG]]
>>>> +; CHECK-V4T-NEXT:    mov [[POP_REG]], r12
>>>> ; CHECK-V4T:         bx  lr
>>>> ; CHECK-V5T:         add sp,
>>>> ; CHECK-V5T-NEXT:    pop {[[SAVED]]}
>>>> -; CHECK-V5T-NEXT:    mov r12, r3
>>>> -; CHECK-V5T-NEXT:    pop {r3}
>>>> +; CHECK-V5T-NEXT:    mov r12, [[POP_REG:r[0-7]]]
>>>> +; CHECK-V5T-NEXT:    pop {[[POP_REG]]}
>>>> ; CHECK-V5T-NEXT:    add sp,
>>>> -; CHECK-V5T-NEXT:    mov lr, r3
>>>> -; CHECK-V5T-NEXT:    mov r3, r12
>>>> +; CHECK-V5T-NEXT:    mov lr, [[POP_REG]]
>>>> +; CHECK-V5T-NEXT:    mov [[POP_REG]], r12
>>>> ; CHECK-V5T-NEXT:    bx lr
>>>> }
>>>> 
>>>> @@ -95,8 +93,7 @@ entry:
>>>> ; Epilogue
>>>> ; --------
>>>> ; CHECK-V4T:    pop {[[SAVED]]}
>>>> -; CHECK-V4T:    pop {r3}
>>>> -; CHECK-V4T:    bx r3
>>>> +; CHECK-V4T:    pop {pc}
>>> 
>>> This ^ is also invalid for the same reason.
>>> 
>>>> ; CHECK-V5T:    pop {[[SAVED]], pc}
>>>> }
>>>> 
>>>> @@ -148,14 +145,18 @@ entry:
>>>> ; --------
>>>> ; CHECK-V4T:         add sp,
>>>> ; CHECK-V4T-NEXT:    pop {[[SAVED]]}
>>>> -; CHECK-V4T-NEXT:    pop {r3}
>>>> +; Only r1 to r3 are available to pop LR.
>>>> +; r0 is used for the return value.
>>>> +; CHECK-V4T-NEXT:    pop {[[POP_REG:r[1-3]]]}
>>>> ; CHECK-V4T-NEXT:    add sp,
>>>> -; CHECK-V4T-NEXT:    bx r3
>>>> +; CHECK-V4T-NEXT:    bx [[POP_REG]]
>>>> ; CHECK-V5T:         add sp,
>>>> ; CHECK-V5T-NEXT:    pop {[[SAVED]]}
>>>> -; CHECK-V5T-NEXT:    pop {r3}
>>>> +; Only r1 to r3 are available to pop LR.
>>>> +; r0 is used for the return value.
>>>> +; CHECK-V5T-NEXT:    pop {[[POP_REG:r[1-3]]]}
>>>> ; CHECK-V5T-NEXT:    add sp,
>>>> -; CHECK-V5T-NEXT:    bx r3
>>>> +; CHECK-V5T-NEXT:    bx [[POP_REG]]
>>>> }
>>>> 
>>>> ; CHECK-V4T-LABEL: noframe
>>>> @@ -191,13 +192,17 @@ entry:
>>>> ; Epilogue
>>>> ; --------
>>>> ; CHECK-V4T:         pop {[[SAVED]]}
>>>> -; CHECK-V4T-NEXT:    pop {r3}
>>>> +; Only r1 to r3 are available to pop LR.
>>>> +; r0 is used for the return value.
>>>> +; CHECK-V4T-NEXT:    pop {[[POP_REG:r[1-3]]]}
>>>> ; CHECK-V4T-NEXT:    add sp,
>>>> -; CHECK-V4T-NEXT:    bx r3
>>>> +; CHECK-V4T-NEXT:    bx [[POP_REG]]
>>>> ; CHECK-V5T:         pop {[[SAVED]]}
>>>> -; CHECK-V5T-NEXT:    pop {r3}
>>>> +; Only r1 to r3 are available to pop LR.
>>>> +; r0 is used for the return value.
>>>> +; CHECK-V5T-NEXT:    pop {[[POP_REG:r[1-3]]]}
>>>> ; CHECK-V5T-NEXT:    add sp,
>>>> -; CHECK-V5T-NEXT:    bx r3
>>>> +; CHECK-V5T-NEXT:    bx [[POP_REG]]
>>>> }
>>>> 
>>>> declare void @llvm.va_start(i8*) nounwind
>>>> 
>>> 
>>> snip
>>> 
>>>> 
>>>> _______________________________________________
>>>> llvm-commits mailing list
>>>> llvm-commits at cs.uiuc.edu <mailto:llvm-commits at cs.uiuc.edu>
>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>>>> 
>>> 
>>> --
>>> Jon Roelofs
>>> jonathan at codesourcery.com <mailto:jonathan at codesourcery.com>
>>> CodeSourcery / Mentor Embedded
>> 
> 
> -- 
> Jon Roelofs
> jonathan at codesourcery.com
> CodeSourcery / Mentor Embedded