[PATCH] ARM and Thumb Segmented Stacks

Fri Feb 28 11:51:57 PST 2014

Hi Tim,
I am the developer of the thumb patch and would like to add some context to
the
decisions in it. The patch is meant to be usable on any target that supports
the thumb instruction set. Thumb2 instructions were not used on purpose so
it
can execute on targets as low as Cortex-M0 (or even lower). These are not
old/weird
architectures but rather just limited in their functionality. The MCR
instruction is not supported because of the non-existent co-processor
hardware. 

The compiled code can be running on a bare bones system without OS, in a
single "thread" (i.e. no multitasking, only interrupts). This is the
lowest denominator which the patch tries to support. The value at the
STACK_LIMIT address should point to the thread local stack limit slot which
in the described case would be just a constant defined by the linker. In the
Linux case from the ARM patch it would point to 0xFFFF0FF0.

The code in such systems is executing from flash memory directly, possibly
even without instruction caches enabled. Having each function call branch 
again to fetch the stack limit (could be just a constant) would destroy even

the little performance these devices possess.

Do you think a patch which optimizes __aeabi_read_tp to a load from 
STACK_LIMIT would be acceptable? Is it even possible to decide whether
MCR is available or a fallback should be used?

Regards,
Svetoslav.

On Fri, 28 Feb 2014 10:35:57 +0000
Tim Northover wrote:

> From: Tim Northover 
> To: Alex Crichton 
> Newsgroups: gmane.comp.compilers.llvm.cvs
> Subject: Re: [PATCH] ARM and Thumb Segmented Stacks
> Date: Fri, 28 Feb 2014 10:35:57 +0000
> 
> Hi Alex,
> 
> > Due to the prologue being executed for all function calls, I believe
> > that by design it strives to not need an extra function call per
> > function call. On x86/x86_64, special slots in the OS's TCB are
> > pre-selected by LLVM for the stack limit, and these slots can't change
> > without modifying LLVM. I think this is just following the same
> > precedent. I would expect this to be a choice made at compile time
> > rather than runtime of where the stack limit is located.
> 
> Hmm. How about going via the generic "TPsoft" pseudo-instruction then?
> (And "tTPsoft" for Thumb).
> 
> Currently, this always expands to an "__aeabi_read_tp" call, but I
> think it would be reasonable to propose a second patch that instead
> uses the MRC on v7-Linux (and possibly general v7 gnueabi; we'd have
> to ask around for when it's a safe optimisation).
> 
> That should get you the efficient MRC when it's right, and some kind
> of functional fall-back for the weird old architectures.
> 
> >> Thanks, that sounds fine. Actually, I've realised the call(s) should
> >> be "blx" rather than "bl" for similar reasons.
> >
> > I'm having a bit of difficulty getting this to compile and run. I can
> > get it to emit assembly, but it ends up tripping an assertion in the
> > assembler.
> 
> Very odd, I can't think of any particular peculiarities for BLX that
> aren't present for BL too. Which assertion is triggering?
> 
> > I only found one other use case in ARMFastISel, and it
> > looks like I need a register operand that needs some serious code to
> > materialize, so I got a little stuck. According to ARM's
> > documentation, it appears that "blx label" is valid, so I'm a little
> > confused why a register in play is necessary.
> 
> The ARM::BLXi and ARM::tBLXi instructions should be the ones that take
> a label rather than an explicit register (tracking implicit liveness
> is another matter that we're going to have to think about with this
> patch anyway, once the outline is secure).
> 
> > How necessary is it to support switching between arm/thumb modes
> > though? I would figure that ARM code would use an ARM __morestack
> > function and Thumb code would use a Thumb __morestack function,
> 
> The problem is that the same binary can contain both ARM and Thumb
> functions, but there is only one __morestack symbol (barring linker
> tricks) which is shared.
> 
> Cheers.
> 
> Tim.