[llvm-commits] patch: PUSH/POP optimization for THUMB

Sat Oct 20 20:19:57 PDT 2012

I was just looking at the instruction timings for load/store vs. load/store multiple.    It looks like on modern chips both of these will eat the same number of cycles, but on ARMv6T2 and older the extra register being pushed might introduce an extra cycle of register lock delay in the rare case where LR is used immediately (by BL maybe).  

At the end of the function, it seems like a POP of multiple items (including PC) will always take at least a cycle longer than a single pop of the PC.  

ARM is somewhat sparse on their timing data for newer chips, though, saying:   "The complexity of the Cortex-A9 processor makes it impossible to calculate precise timing information manually."

Information from http://infocenter.arm.com/help/topic/com.arm.doc.ddi0338g/DDI0338G_arm1156t2s_r0p4_trm.pdf
 and http://infocenter.arm.com/help/topic/com.arm.doc.ddi0388i/DDI0388I_cortex_a9_r4p1_trm.pdf

So maybe this optimization should be run at -Os builds and / or when building for v6T2 and below. 

Gordon Keiser
Software Development Engineer
Arxan Technologies
gkeiser at arxan.com  www.arxan.com

> -----Original Message-----
> From: llvm-commits-bounces at cs.uiuc.edu [mailto:llvm-commits-
> bounces at cs.uiuc.edu] On Behalf Of Evan Cheng
> Sent: Saturday, October 20, 2012 2:14 PM
> To: liangh at codeaurora.org
> Cc: llvm-commits at cs.uiuc.edu
> Subject: Re: [llvm-commits] patch: PUSH/POP optimization for THUMB
> 
> Hi Liang,
> 
> It seems the cost is too high for the reward. I understand the benefit of using a
> 16-bit instruction here but does this comes up frequently? I would have been
> ok with this if it's a localized change. But your proposal includes a target hook
> hasLiveStackObjects() that seems to serve no purpose other than this minor
> optimization. Is it possible for you to significantly simplify the patch?
> 
> Evan
> 
> On Oct 19, 2012, at 4:04 PM, liangh at codeaurora.org wrote:
> 
> > Hi,
> >
> > This patch implements a PUSH/POP optimization.
> >
> > Currently, LLVM pushes/pops a dummy register to keep the stack aligned
> > when the number of registers need to be pushed/popped is an odd.
> > For example: "r11" here is pushed as a pad to align the stack.
> > ##################
> > foo:
> >  push  {r11, lr}
> >  sub sp, #8
> >  add r0, sp, #4
> >  bl  goo
> >  add sp, #8
> >  pop {r11, pc}
> > ##################
> >
> > Enabled by this patch, the stack can be padded to align if the PUSH
> > instruction is followed by an SP-adjusting instruction, so that the
> > useless efforts of storing and loading the dummy register can be saved.
> > Also, a PUSH instruction can be encoded with 16 but not 32 bits if it
> > is pushing a single register.
> > For example, the above code will be changed to:
> > ##################
> > foo:
> >  push  {lr}
> >  sub sp, #12
> >  add r0, sp, #4
> >  bl  goo
> >  add sp, #12
> >  pop {pc}
> > ##################
> >
> > This optimization only takes effect in thumb mode with a non-zero
> > optimization level.
> >
> > Could you please review the attached patch?
> > Thanks.
> >
> > -Liang<0001-PUSH-POP-
> Optimization.patch>______________________________
> > _________________
> > llvm-commits mailing list
> > llvm-commits at cs.uiuc.edu
> > http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
> 
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits