[llvm-commits] patch: PUSH/POP optimization for THUMB

Tue Oct 23 15:01:47 PDT 2012

Hi Evan and Gordon,

Thank you very much for your feedback.

First of all, I agree with what Gordon said that ARMv7 and after have a
wider data bus to hide the load-multi latency. But still we can save the
power in this way. More important, as Evan has pointed out, this patch is
primarily a code size optimization, enabled by "-mthumb -O.." (as long as
not "-O0"). I didn't limited it to "-Os" because sometime people want
"-mthumb -O3/-O2" and this optimization won't hurt them.

Second, this case comes up fairly frequent. Although I don't have
statistics I do seem them a lot, and GCC uses the same stragegy as I
implemented, that's why I worked on this. BTW. no matter how frequently
this case comes up, the benefit comes for free. Plus, we enable this
optimization for codesize mainly.

Finally, let's talk about the complexity.
The function hasLiveStackObjects() contains small simple calculations only
and has been decleared as "inline". It looks cleaner in such orgnization,
but it can be flatten into the caller if you prefer.

I was trying to simplify it, but there is no easy way to do so because it
involves two passes -- ARMFrameLowering (to set part of the information)
and PEI (to finally insert the instructions). (PEI does have reasons to
have been so complicated and messy.)

Actrually, the main "complexity" in this patch comes from the fact that I
need to pre-calculate (via hasLiveStackObjects()) to tell if an
SP-adjustment instruction will be following, since the instruction itself
has not yet been inserted by this early. The code in hasLiveStackObjects()
is in fact partially redundant to some code in PEI.

An alternative way is to remove this condition (as well as
hasLiveStackObjects()), and do stack-padding unconditionally:

##################
foo:
 push  {r11, lr}
 ...
 pop {r11, pc}
##################

would be changed to:
##################
foo:
 push  {lr}
 sub sp, #4
 ...
 add sp, #4
 pop {pc}
##################

But, I don't know if people is going to like this.

In short, the patch is not very complicated as it seemed, and not easy to
simplify. But if you do have any good idea to significantly simplify it,
please let me know and I would happy to rethink it.

Thanks.
-Liang

> I see this as primarily a code size optimization. That's fine as it is.
> However, the implementation should be localized in one spot for a small
> optimization like this. The PEI code is already fairly messy and
> complicated and we should avoid making it worse.
>
> Evan
>
> On Oct 20, 2012, at 8:19 PM, Gordon Keiser <gkeiser at arxan.com> wrote:
>
>> I was just looking at the instruction timings for load/store vs.
>> load/store multiple.    It looks like on modern chips both of these will
>> eat the same number of cycles, but on ARMv6T2 and older the extra
>> register being pushed might introduce an extra cycle of register lock
>> delay in the rare case where LR is used immediately (by BL maybe).
>>
>> At the end of the function, it seems like a POP of multiple items
>> (including PC) will always take at least a cycle longer than a single
>> pop of the PC.
>>
>> ARM is somewhat sparse on their timing data for newer chips, though,
>> saying:   "The complexity of the Cortex-A9 processor makes it impossible
>> to calculate precise timing information manually."
>>
>> Information from
>> http://infocenter.arm.com/help/topic/com.arm.doc.ddi0338g/DDI0338G_arm1156t2s_r0p4_trm.pdf
>> and
>> http://infocenter.arm.com/help/topic/com.arm.doc.ddi0388i/DDI0388I_cortex_a9_r4p1_trm.pdf
>>
>> So maybe this optimization should be run at -Os builds and / or when
>> building for v6T2 and below.
>>
>> Gordon Keiser
>> Software Development Engineer
>> Arxan Technologies
>> gkeiser at arxan.com  www.arxan.com
>>
>>> -----Original Message-----
>>> From: llvm-commits-bounces at cs.uiuc.edu [mailto:llvm-commits-
>>> bounces at cs.uiuc.edu] On Behalf Of Evan Cheng
>>> Sent: Saturday, October 20, 2012 2:14 PM
>>> To: liangh at codeaurora.org
>>> Cc: llvm-commits at cs.uiuc.edu
>>> Subject: Re: [llvm-commits] patch: PUSH/POP optimization for THUMB
>>>
>>> Hi Liang,
>>>
>>> It seems the cost is too high for the reward. I understand the benefit
>>> of using a
>>> 16-bit instruction here but does this comes up frequently? I would have
>>> been
>>> ok with this if it's a localized change. But your proposal includes a
>>> target hook
>>> hasLiveStackObjects() that seems to serve no purpose other than this
>>> minor
>>> optimization. Is it possible for you to significantly simplify the
>>> patch?
>>>
>>> Evan
>>>
>>> On Oct 19, 2012, at 4:04 PM, liangh at codeaurora.org wrote:
>>>
>>>> Hi,
>>>>
>>>> This patch implements a PUSH/POP optimization.
>>>>
>>>> Currently, LLVM pushes/pops a dummy register to keep the stack aligned
>>>> when the number of registers need to be pushed/popped is an odd.
>>>> For example: "r11" here is pushed as a pad to align the stack.
>>>> ##################
>>>> foo:
>>>> push  {r11, lr}
>>>> sub sp, #8
>>>> add r0, sp, #4
>>>> bl  goo
>>>> add sp, #8
>>>> pop {r11, pc}
>>>> ##################
>>>>
>>>> Enabled by this patch, the stack can be padded to align if the PUSH
>>>> instruction is followed by an SP-adjusting instruction, so that the
>>>> useless efforts of storing and loading the dummy register can be
>>>> saved.
>>>> Also, a PUSH instruction can be encoded with 16 but not 32 bits if it
>>>> is pushing a single register.
>>>> For example, the above code will be changed to:
>>>> ##################
>>>> foo:
>>>> push  {lr}
>>>> sub sp, #12
>>>> add r0, sp, #4
>>>> bl  goo
>>>> add sp, #12
>>>> pop {pc}
>>>> ##################
>>>>
>>>> This optimization only takes effect in thumb mode with a non-zero
>>>> optimization level.
>>>>
>>>> Could you please review the attached patch?
>>>> Thanks.
>>>>
>>>> -Liang<0001-PUSH-POP-
>>> Optimization.patch>______________________________
>>>> _________________
>>>> llvm-commits mailing list
>>>> llvm-commits at cs.uiuc.edu
>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>>>
>>> _______________________________________________
>>> llvm-commits mailing list
>>> llvm-commits at cs.uiuc.edu
>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>
>