[PATCH] ARM and Thumb Segmented Stacks

Fri Feb 28 13:34:03 PST 2014

Hi Tim,

>OK, if you want to go that route then Thumb-1 implementations have
>their own frame lowering code for the most part (in
>Thumb1FrameLowering.cpp). ARMFrameLowering.cpp is shared between ARM
>and Thumb2 since the instruction-sets are mostly compatible.

I put it in ARMFrameLowering so I could target both Thumb1 and Thumb2
with the same code - Thumb1FrameLowering extends ARMFrameLowering
so it will re-use the code.

>Cortex-M0 is weird. The rest are old *and* weird. They also have
>extremely varied platform support and interfaces (that LLVM is not in
>the business of catering to at the moment).

OK, they might be :), I am a newbie on the ARM scene so I wouldn't know.
But I know that there are new chip design based on them so they are
still of interest, at least in the embedded world.

>> The code in such systems is executing from flash memory directly, possibly
>> even without instruction caches enabled. Having each function call branch
>> again to fetch the stack limit (could be just a constant) would destroy even
>> the little performance these devices possess.
>Perhaps, but LLVM should not be expected to support every tin-pot
>RTOS's idea of how this stack limit should be defined. I'm getting
>enough heebie-jeebies from embedding the "TP + 4*63" address in LLVM,
>but at least it has some history of working and being used.

Of course, I agree. I was hoping though that there would be some way to get 
the TP pointer/stack limit in an alternative, OS-independent way
(without incurring additional branch).

>> Do you think a patch which optimizes __aeabi_read_tp to a load from
>> STACK_LIMIT would be acceptable?
>
>No. For a start the thread pointer is used for many things other than
>the stack limit.

OK, bad wording. What I meant was - do you think it would be possible 
to optimize __aeabi_read_tp for niche platforms (i.e. the no-OS case) 
using some generic approach? I understand that your position on this
is to use a runtime call and no other alternative is feasible.

>Second, the only references I see to that symbol in a Google search
>appear to be from this work. It's not even close to a standard (even a
>de facto one) and it clashes with permitted C identifiers. Someone is
>entirely within their rights to write "int STACK_LIMIT = 42;" in their
>program

Yes, I named it like this and wouldn't expect to keep the name. It was
just a "temporary" decision.

>Even ignoring that, I think the stack-limit-via-a-pointer idea is
>horribly flawed as any kind of default. There's basically no way
>forward if Linux decides to deprecate and remove this magical
>0xffff0ff0 address and redirect all TLS via TPIDRURW.

If it is a public contract then I wouldn't worry about it changing more
than the probability in moving TLS out of TPIDRURW to somewhere else.
This is not the case though, I think this address is hidden behind a 
library call.

For embedded it is not an issue since the program (firmware) is 
compiled as a whole and at one time.

>> Is it even possible to decide whether MCR is available or a fallback should be used?
>
>We can certainly tell whether MRC and TPIDRURW exist on the CPU at
>compile-time, but whether it actually maps to any kind of thread base
>is the decision of the OS-writer, as is whether reading from "TP+4*63"
>will launch nukes at Russia or give you the stack limit.

So for major platforms it can be hardcoded in the source. And niche
platforms should use the runtime call. Can there be a middle ground?
For example having a list of available conventions which can be
selected at compile time - much like the floating point soft vs hard case?

And even if the runtime call is used, how am I to know where to find
the stack limit in the returned tp address? This is again OS-specific. So
we are back to the same point.

Regards,
Svetoslav.