[LLVMdev] [RFC] [ARM] v6m: Suggestions for a slightly different set of default optimizer settings.

Jonathan Roelofs jonathan at codesourcery.com
Mon Jan 12 13:48:57 PST 2015



On 1/12/15 2:39 PM, Bjoern Haase wrote:
> Am 12.01.2015 um 16:28 schrieb Jonathan Roelofs:
>>
>>
>> On 1/11/15 12:35 PM, Bjoern Haase wrote:
>>> Hello to all.
>>>
>>> When studying forums and mailing lists it seems to me that llvm usage
>>> for very small arm v6m targets is not so common.
>> ...snip...
>>> For the most important v6m system, cortex M0 / M0+, the main speed
>>> bottlenecks were register pressure and the slow (2-cycle) overhead for
>>> memory accesses.  Besides special tricks, the asm optimizations did
>>> improve by changing internal calling conventions (no callee-saved-regs,
>>> all regs saved by caller), by replacing individual LDR/STR by LDM/STM
>>> sequences operating on more registers and by using the upper register
>>> half as spill bank.
>>>
>>> When looking at those points, I suppose that the last aspect might be
>>> implemented in LLVM without too much of problems. Basically, the idea is
>>> to use R8,R10,R11,R12 and R13 as temporary spill slots that may be
>>> accessed with only 1 cycle instead of the 2 cycles required for memory
>>> accesses. For our crypto, we have tried hard but in vain using the upper
>>> registers for anything useful beside spill bank usage.
>>> If llvm identifies large functions with lots of stack slots, it might be
>>> a good idea considering adding the upper regs to the spill list and
>>> replacing stack slot accesses to register accesses instead, if possible.
>> This sounds like a really interesting idea. One concern about this
>> would be the cost of spilling from one of these hi-reg spill slots
>> (since push & pop only operate on lo regs). Because of that, you'd
>> need to avoid using them to spill live ranges that cross calls.
> I did not quite get the point. All of those regs (except R13 of course)
> are required to be callee saved by the ABI. So you could safely use
> them to hold spilled data across calls. Of course, initial pushing and

Sorry, you're right. I was incorrectly thinking they were caller-saved.

 From the AAPCS:

"A subroutine must preserve the contents of the registers r4-r8, r10, 
r11 and SP (and r9 in PCS variants that
designate r9 as v6).
In all variants of the procedure call standard, registers r12-r15 have 
special roles. In these roles they are labeled
IP, SP, LR and PC."


Jon

> final poping is tedious. This is why, it's beneficial only when
> heuristics indicate large functions with very many life registers where
> the CPU is very busy with spilling and restoring. Actually, when doing
> non-trivial
> 64 bit arithmetics, this is what happens.
>
> Yours
> ,
> Björn

-- 
Jon Roelofs
jonathan at codesourcery.com
CodeSourcery / Mentor Embedded



More information about the llvm-dev mailing list