[PATCH] ARM and Thumb Segmented Stacks

Fri Feb 28 12:40:18 PST 2014

Hi Svetoslav,

> I am the developer of the thumb patch and would like to add some context to
> the decisions in it. The patch is meant to be usable on any target that supports
> the thumb instruction set.

OK, if you want to go that route then Thumb-1 implementations have
their own frame lowering code for the most part (in
Thumb1FrameLowering.cpp). ARMFrameLowering.cpp is shared between ARM
and Thumb2 since the instruction-sets are mostly compatible.

> Thumb2 instructions were not used on purpose so
> it can execute on targets as low as Cortex-M0 (or even lower). These are not
> old/weird architectures but rather just limited in their functionality.

Cortex-M0 is weird. The rest are old *and* weird. They also have
extremely varied platform support and interfaces (that LLVM is not in
the business of catering to at the moment).

> The code in such systems is executing from flash memory directly, possibly
> even without instruction caches enabled. Having each function call branch
> again to fetch the stack limit (could be just a constant) would destroy even
> the little performance these devices possess.

Perhaps, but LLVM should not be expected to support every tin-pot
RTOS's idea of how this stack limit should be defined. I'm getting
enough heebie-jeebies from embedding the "TP + 4*63" address in LLVM,
but at least it has some history of working and being used.

> Do you think a patch which optimizes __aeabi_read_tp to a load from
> STACK_LIMIT would be acceptable?

No. For a start the thread pointer is used for many things other than
the stack limit.

Second, the only references I see to that symbol in a Google search
appear to be from this work. It's not even close to a standard (even a
de facto one) and it clashes with permitted C identifiers. Someone is
entirely within their rights to write "int STACK_LIMIT = 42;" in their
program

Even ignoring that, I think the stack-limit-via-a-pointer idea is
horribly flawed as any kind of default. There's basically no way
forward if Linux decides to deprecate and remove this magical
0xffff0ff0 address and redirect all TLS via TPIDRURW.

The only really platform-agnostic way to handle these things is via a
runtime call. Beyond that I can see "v7 Linux says MRC is good"
optimisations getting in, but not "ARM2 RiscOS wants to dance thrice
widdershins around some goat entrails".

> Is it even possible to decide whether MCR is available or a fallback should be used?

We can certainly tell whether MRC and TPIDRURW exist on the CPU at
compile-time, but whether it actually maps to any kind of thread base
is the decision of the OS-writer, as is whether reading from "TP+4*63"
will launch nukes at Russia or give you the stack limit.

Cheers.

Tim.