[PATCH] ARM + Thumb Segmented Stacks (v2)

Fri Mar 28 07:59:16 PDT 2014

> > If you search for CFI_INSTRUCTION in ARMFrameLowering.cpp, you will
> find all
> > of the places where we currently emit call frame instructions. There
> are two
> > things these instructions do:
> > * Let the debugger know, at any point in the function, how to
> calculate the
> > CFA (Canonical Frame Address). This is defined for ARM as the value
> of sp at
> > entry to the current function. This will need doing every time you
> update
> > the stack pointer (i.e. once for each push and pop instruction).
> > * Let the debugger know how to find the value of each register as it
> would
> > have been immediately before this function was called. This will need
> doing
> > each time you save or restore the value of a callee-saved register.
> >
> > This should be simple enough to implement, so I'd like it to be done
> as part
> > of this patch.
> 
> OK, I have attempted to add information, as well as add a test. I
> wasn't entirely sure what to for the pops, though. Most other pops I
> saw in the tests were immediately followed by cfi_endproc, which isn't
> quite accurate here. You may want to double check the test to make
> sure I didn't get anything wrong.

After the "push {lr}" (line 1872-ish), you only need to emit a CFI for the
lr. The offsets are all relative to the CFA, so the records for r4 and r5 do
not need to be updated, and the offset for sp will be -12.

Also, after the "pop {r4, r5}" at the beginning of the rest of the function
(line 1957-ish), you need to emit .cfi_same_value for r4 and r5, to let the
debugger know that the previous stack frame's values of r4 and r5 are now
found in the actual registers, not on the stack. This is important, because
the values in the caller's stack frame will continue to be wrong while the
function body is executing, which will be visible to the user.

> > 4) I don't think we need to push/pop lr around the call to
> __morestack.
> 
> Wouldn't the "bl __morestack" clobber the link register? If it were
> just "b __morestack" then how would __morestack know how to call the
> original function on the new stack?

Ah, of course. My mistake.

> > 5) You are currently emitting 32-bit push and pop instructions, where
> 16-bit
> > encodings will do. The only push/pop instruction which you are using
> which
> > does not have a 16-bit encoding is "ldm.w   sp!, {lr}" (aka push.w
> {lr}).
> >
> > Optimisations (4) and (5) should be fairly simple, so I think they
> should be
> > done as part of this patch, but the others are a bit more involved,
> so could
> > wait until a later patch.
> 
> I didn't implement (4) due to my above confusion (but if I'm wrong,
> I'm happy to change it).
> 
> I've attempted to implement (5), but if I chose the wrong opcodes,
> just let me know.

Most of them look correct, but the "pop {lr}" instruction is now using
ARM::tPOP, which corresponds to encoding T1, which can only load into
registers r0-r7 and pc. (The reason of the difference between the Thumb1
push and pop instructions is that it can be used for efficient function
epilogues, popping the lr straight back into the pc.) I see two options:
* Use encoding T3, which can be any register except for sp. This is a
Thumb2-only instruction, so this code could not run on Thumb1-only
processors. I think this is roughly what you had before.
* Use POP encoding T1 to pop into r4, followed by MOV encoding T1 to move
from r4 to lr. Both of these instructions are available on Thumb1.

Since Thumb1FrameLowering inherits from ARMFrameLowering, your code is
currently being run for Thumb1, but emitting code which is not valid Thumb1.
Because of this, you will need to either ensure you only use Thumb1
instructions when generating code for Thumb1 or Thumb2, or add an
implementation of adjustForSegmentedStacks to Thumb1FrameLowering. It is
fine if this implementation just throws report_fatal_error for now, if you
don't want to support Thumb1 yet.

> > 7) I'm slightly confused as to why ARM and Thumb use different
> methods to
> > get the stack limit.  It seems to me that this should be decided
> based on
> > platform (e.g. linux or android), not instruction set, especially
> given that
> > ARM and Thumb code can be freely combined in the same executable.
> 
> According to the original author [1], apparently some Thumb units
> don't have the coprocessor that the ARM version uses. The location of
> the stack limit seems like it can be all over the place and is pretty
> dependent to the platform that it's running on, so I'm not quite sure
> what the best way is to reconcile all these without adding a zillion
> configuration options.

This may not be too important an issue at the moment, as rust does not
support thumb (as far as I can tell, [2] does not contain thumb and the arm
directory does not have any thumb assembly). Do you know of anyone using
this for thumb, and what platform they are targeting? However, switching
based on the instruction set (or any other hardware feature) is really not
the right way to do this, as this is a platform decision.

> [1] - https://github.com/rust-lang/llvm/pull/4
[2] - https://github.com/mozilla/rust/tree/master/src/rt/arch