[llvm-dev] [RFC] Improving compact x86-64 compact unwind descriptors
John Reagan via llvm-dev
llvm-dev at lists.llvm.org
Fri Jan 26 07:54:53 PST 2018
Here is our proposal to extend/enhance the x86-64 compact unwind
descriptors to fully describe the prologue/epilogue for asynchronous
unwinding. I believe there are missing/lacking CFI directives as well,
but I'll save that for another thread.
Asynchronous Compact Unwind Descriptors
Ron Brender, VMS Software, Inc.
Revised January 25, 2018
1 Introduction
This document proposes means to extend so-called compact unwind
descriptors to support fully
asynchronous exception handling. This will make extended compact unwind
descriptors an
alternative to DWARF CFI (call frame information) for achieving
asynchronous exception
support.
Compact unwind descriptors can and have been used in both 64- and 32-bit
environments.
However, this proposal addresses only 64-bit environments. While the
ideas presented here can
be readily adapted for use in a 32-bit environment, for simplicity this
document makes no
attempt to do so.
There are generally three kinds of information that together form the
heart of modern software
exception handling systems:
1. Information that is used to divide the remaining unwind information
into groups that are
specific to particular regions of memory (often, but not
necessarily, associated with a
single function) as well as provide a way to efficiently search for
and identify the
grouping that is associated with a particular address in memory.
2. Information that can be used to virtually or actually unwind from
the call frame of an
executing function to the call frame of its caller (at the point of
the call).
3. Identification of an associated personality routine that is invoked
by the general exception
handling mechanization to guide how processing should proceed for a
given function, as
well as additional “language specific data” needed for the
personality routine to do its
job. Note that the personality routine and its related data are
specified as an adjunct to
compiled code and totally opaque to the general mechanism (other
than the specified
interface).
Note in particular that C++ exception handling is built on top of a
personality routine and
language specific data area ABI that itself can be implemented using
either DWARE CFI or
extended compact unwind information as described here. The choice
between the two is
transparent to C++.
2 Compact Unwind Overview
This section provides a brief overview of key features of the LLVM
compact unwind design. It
does not attempt a comprehensive re-statement of all aspects of the
design except to the extent
necessary to motivate and understand later proposed changes and
enhancements.
2.1 Compact Unwind Group Description
A compact unwind group consist of five fields, as follows:
| 63 32 31 0 |
|-------------------------------------------------------------------|
| STARTING-ADDRESS |
| LENGTH | COMPACT-UNWIND-DESCRIPTION |
| PERSONALITY-FUNCTION-POINTER |
| LANGUAGE-SPECIFIC-DATA-ADDRESS |
|-------------------------------------------------------------------|
STARTING-ADDRESS (64-bits) is the lowest address of a region of memory
occupied by some
code, typically the entry point of a function.
LENGTH (32-bits) is the number of bytes included in this group,
typically including all and only
the code of a function.
COMPACT-UNWIND-DESCRIPTION (32-bits) is a description of the fully
formed frame of a
function and how to unwind it. This is described further following.
PERSONALITY-FUNCTION-POINTER (64-bit) is a pointer to the personality
routine.
LANGUAGE-SPECIFIC-DATA-ADDRESS (64-bits, sometimes abbreviated LSDA) is a
pointer to some data to be passed to the personality routine when it is
called.
A key observation is that the starting address plus length way of
describing a group means that
the set of groups for a compilation unit need not describe all of the
code in that unit. In
particular, it appears to be expected that no unwind information need be
generated for leaf
functions.
On the other hand, it is reasonable to expect that the groups that are
emitted are ordered by the
starting address. This means that a simple and fast binary search can be
used to map an address
to the group that applies to that address, if any.
It is useful to note that the run-time representation of unwind
information can vary from little
more than a simple concatenation of the compile-time information to a
substantial rewriting of
unwind information by the linker. The proposal favors simple
concatenation while maintaining
the same ordering of groups as their associated code.
2.2 Compact Unwind Frame Description
A compact unwind frame description describes a frame in sufficient
detail to be able to unwind
that frame to the frame of its caller.
| 31 28 27 24 23 0 |
|-------------------------------------------------------------------|
| FLAGS | MODE | |
|-------------------------------------------------------------------|
At the top most level, there are four bits that are not of further
interest here. Interpretation of
these bits is neither used nor changed.
Also at the top-level is a 4-bit mode field. This is the tag of a
discriminated (tagged, variant)
union that selects the interpretation of the remaining 24 bits.
Of the 16 possible modes, only 5 are defined :
Code Meaning Description
0 Old “Old” is presumed to refer to some historical
design that is no longer of interest.
It is treated here as Reserved.
1 RBP-based The frame uses the RBP register as a frame pointer.
The size of the frame can
frame vary during execution.
2 RSP-based The frame uses RSP as the frame pointer. The size
of the frame is fixed (at
frame compilation time).
3 Large RSP- The frame uses RSP as the frame pointer, The size
of the frame is fixed (at
based frame compilation time); however, that size is too large
to express within this 32-bit
descriptor encoding.
4 DWARF The frame, for whatever reason, cannot be
adequately described using the
escape compact unwind frame description. The remaining
24-bits are an index into
what the DWARF standard calls the .debug_frame
section (__eh_frame in
LLVM).
2.2.1 RBP-based Frame (MODE=1)
For a RBP-based frame, the remaining 24 bits are encoded as follows:
| 23 16 | 15 | 14 0 |
|-------------------------------------------------------------------|
| OFFSET | 0 | REGS |
|-------------------------------------------------------------------|
In a RBP-based frame the RBP register is pushed on the stack immediately
after the return
address, then RSP is moved to RBP. To unwind, RSP is restored with the
current RPB value,
then RBP is restored by popping off the stack, and the return is done by
popping the stack once
more into the instruction pointer.
All preserved registers are saved in a small range in the stack that
starts at RBP-8 to RBP-2040.
The offset/8 relative to RBP is encoded in the 8-bit OFFSET field. The
registers saved are
encoded in the 15-bit REGS field as five 3-bit entries.
2.2.2 RSP-Based Frame (MODE=2)
For a RSP-based frame, the remaining 24 bits are encoded as follows:
|23 16 | 15 13 | 12 10 | 9 0 |
|-------------------------------------------------------------------|
| SIZE | | CNT | REG_PERM |
|-------------------------------------------------------------------|
In a RSP-based frame the stack pointer serves directly as the frame
pointer and RBP is available
for use as a general register. Upon entry, the stack pointer is
decremented by 8*SIZE bytes (the
maximum stack allocation is thus 2040 bytes). To unwind, the stack size
is added to the stack
pointer, and completed by popping the stack once more into the
instruction pointer.
All preserved registers are saved on the stack immediately after the
return address. The number
of registers saved (up to 6) is encoded in the 3-bit CNT field. The
11-bit REG_PERM field
encodes which registers were saved and in what order.
2.2.3 Large RSP-Based Frame (MODE=3)
For a large RSP-based frame, the remaining 24 bits are encoded as follows:
|23 16 | 15 13 | 12 10 | 9 0 |
|-------------------------------------------------------------------|
| SIZE | ADJ | CNT | REG_PERM |
|-------------------------------------------------------------------|
This case is like the previous, except the stack size is too large to
encode in the compact unwind
encoding. Instead, the function must include a "subq $nnnnnnnn, RSP"
instruction in its
prologue to allocate the stack. The offset from the entry point of the
function to the nnnnnnnn
value in the function is given in the SIZE field.
Depending on the exact instructions used to save registers (PUSH versus
MOV), the nnnnnnnn
value in the instruction stream may not be quite the full stack size.
ADJ * 8 is the additional
adjustment needed to get the actual size.
2.2.4 DWARF Escape (MODE=4)
The frame, for whatever reason, cannot be adequately described using a
compact unwind frame
description. The remaining 24-bits are an index into what the DWARF
standard calls the
.debug_frame section (called __eh_frame in LLVM).
3 Asynchronous Changes and Enhancements
It is immediately obvious that omission of unwind information for leaf
functions (with any kind
of frame) precludes handling an exception that might occur during its
execution. It follows that
unwind information must cover all of the code of a module (with one
exception discussed
below). But if successive unwind groups are ordered (as previously
assumed) and also leave no
gaps, then the LENGTH field is redundant and can be omitted. The
beginning address of a
following group is always one byte past the end of the predecessor
group. There remains only the
question of how to identify the last group of a set.
It should also be clear that the unwind representation described in the
prior section is not
sufficient to unwind from an asynchronous exception that might occur in
either the prologue or
epilogue of a function. To see this consider what would happen if an
exception occurred at either
the entry point or the return instruction of either a RBP- or RSP-frame
function. To be able to
handle asynchronous exceptions at any point during function execution,
it is necessary to add
additional information to each unwind group.
These two considerations can be combined. The result is simply to
repurpose the LENGTH field
to encode prologue and epilogue information.
3.1 Extended MODEs
To preserve backward compatibility and to allow intermixing of
traditional and extended
compact unwind groups, new MODEs are defined as follows:
Code Meaning Description
----------------------------------------------------------------------------------------------
8 Null frame See below.
9 Extended RBP-based frame Like mode 1 but combined with
extended prologue/epilogue
information in place of a LENGTH field
10 Extended RSP-based frame Like mode 2 but combined with
extended prologue/epilogue
information in place of a LENGTH field
11 Extended Large RSP-based Like mode 3 but combined with
extended prologue/epilogue
frame information in place of a LENGTH field
12 Extended parameter(s) See below.
A null frame (MODE = 8) is the simplest possible frame, with no
allocated stack of either kind
(hence no saved registers). A null frame might be considered just a
special case of a RSP-based
frame with a stack size of zero. But unlike other frames, its frame
pointer is usually not 16-byte
aligned.
It appears technically feasible for a null frame function to have a
personality routine. However,
the utility of such a capability seems too meager to justify allowing
this. We propose to not
support this.
There remains the question of whether it is necessary or at least
desirable to strictly apply the
requirement that “all” code be covered by an unwind group. Based on
successful experience in
OpenVMS on the Itanium architecture (as well as Windows systems we
think), this alternative is
proposed:
If the first attempt to lookup an unwind group for an exception
address fails, then it is
(tentatively) assumed to have occurred within a null frame function
or in a part of a function
that is adequately described by a null frame. The presumed return
address is (virtually or
actually) popped from the top of stack and looked up. This second
attempted lookup must
succeed, in which case processing continues normally. A failure is a
fatal error.
This concept of a “default null frame group” is especially convenient
for dealing with
disconnected code sequences such as trampolines, PLTs, and the like.
The extended RBP-based and RSP-based frame modes (MODEs = 9 and 10) are
simple
supersets of their more traditional predecessors.
The large RSP-based frame mode (MODE = 11) is also a superset of mode 3
except that instead
of finding the stack size in the instruction stream, it is found in a
preceding extended parameter
unwind group (MODE = 12). This difference is essential for support of
execute-only code.
To understand the extended parameter group, suppose that two groups
occur one after the other
but have the same STARTING-ADDRESS value. A binary search using the
STARTING-
ADDRESS field will ignore the first of the pair because the address
being sought can never be at
least as large as the first but less than the second. Such a group can
be used as an escape
convention that allows inserting additional information into the
sequence of groups that
otherwise cannot be easily included.
The extended modes are described more fully in the following sections.
3.2 Function Parts
Functions are generally considered to consist of three distinct parts:
1. A prologue, which allocates local storage, saves registers that must
be preserved for the
benefit of callers, and possibly other housekeeping (such as
initializing any local state
necessary for the correct management of the function’s execution).
2. A body, which performs the main work of the function.
3. An epilogue, which makes the result available to the caller, and
undoes the effects of the
prologue (restores preserved registers, deallocates local storage,
and so on).
Here a word of caution is in order. This vocabulary tends to be applied
at multiple levels in
software architecture and each level has its own set of issues to
consider and resulting
requirements to be observed.
Note that a null frame function has no distinct prologue, body or
epilogue. Every instruction can
be viewed as simultaneously in all three parts, or in none of them.
This leads to the following proposal for the upper part of the extended
compact unwind
quadword for use in combination with MODE = 8 in the lower part.
| 63 48 | 47 32 |
|-------------------------------------------------------------------|
| RESERVED | 0 ... 0 |
|-------------------------------------------------------------------|
| 31 16 | 15 0 |
|-------------------------------------------------------------------|
| 0 0 0 0 | 1 0 0 0 | 0 ... 0 | 0 ... 0 |
|-------------------------------------------------------------------|
3.2.1 Function Prologue
For the purposes of exception handling, the key steps are:
1. Allocate space on the stack in which to save preserved registers and
for other purposes
2. Save the registers that must be preserved.
The DWARF call frame information (CFI) provides a description that is
precise and accurate at
each and every instruction in a function (not limited to just the
prologue). But experience shows
that this representation is bulky in space and expensive in time to
decode and interpret. Indeed,
compact unwind descriptors as described in Section 2 were created to
avoid these issues.
Experience with OpenVMS on the Alpha and Itanium architectures shows
that a key constraint
can greatly simplify the unwind description: require that no preserved
register can be changed
until all of them have been saved. The last such save ends the prologue
and the next instruction
begins the body. An unwind does not need (must not) restore any
preserved registers because
they are still valid. A simple length from the entry point suffices to
describe this boundary.
[The body does not necessarily need to begin until at least one of the
preserved registers is
actually changed.]
3.2.1.1 RSP-Based Frame (MODE = 10)
For a RSP-based frame, it is also necessary to know which instruction
adjusts the stack pointer:
before this instruction an unwind must only (virtually or actually) pop
the return address; and
after this instruction, the stack pointer must first be incremented to
deallocate the stack frame,
then the instruction pointer restored.
This leads to the following proposal for the upper part of the extended
compact unwind
quadword for use in combination with MODE = 10 in the lower part.
| 63 48 47 46 40 39 32 |
|-------------------------------------------------------------------|
| RESERVED | | PROLOGUE-SIZE2 | PROLOGUE-SIZE1 |
|-------------------------------------------------------------------|
Except for the MODE, the lower part is exactly like it would be for MODE
= 2.
PROLOGUE-SIZE1 is the length in bytes of the first part of the prologue
relative to the
STARTING-ADDRESS, up to and including the instruction that allocates the
stack.
PROLOGUE-SIZE2 is the length in bytes of the second part of the prologue
relative to the end
of the first part, up to a point after the last preserved register is
saved and before the first
preserved register is changed. (Recall, this point may not be unique.)
PROLOGUE-SIZE1 + PROLOGUE-SIZE2 gives the total size of the prologue.
The maximum prologue size allowed here is much greater than will be
typical. This is deliberate
to allow compilers freedom to use shrink-wrap type optimizations in
which safe operations that
need no body state are moved from the body to the beginning of the
prologue. This avoids the
cost of stack frame setup and teardown for simple special case handling
that often leads to an
early exit.
3.2.1.2 Large RSP-based Frame (MODE = 11)
This case is like the previous, except the stack size is too large to
encode in the compact unwind
encoding. The SIZE field is interpreted as an offset relative to the
beginning of the containing
unwind group to (what is necessarily) an earlier extended parameter group.
3.2.1.3 RBP-Based Frame (MODE = 9)
The RBP-based frame is the same as for a RSP-based frame except that
there are two instructions
that mark the transition between the two parts of the prologue: the push
of the prior RBP value
onto the stack followed by the copy of RSP to RBP to establish the new
frame pointer.
There appears to be no reason to not make this simplifying requirement:
the push and copy
always occur together.
This leads to the following proposal for the upper part of the extended
compact unwind
quadword for use in combination with MODE = 9 in the lower part.
| 63 48 47 46 40 39 32 |
|-------------------------------------------------------------------|
| RESERVED | | PROLOGUE-SIZE2 | PROLOGUE-SIZE1 |
|-------------------------------------------------------------------|
Except for the MODE, the lower part is exactly like it would be for MODE
= 1.
PROLOGUE-SIZE1 is the length in bytes of the first part of the prologue
relative to the
STARTING-ADDRESS, up to and including the instruction that pushes the
prior RBP contents.
PROLOGUE-SIZE2 is the length in bytes of the second part of the prologue
relative to the end
of the first part, up to a point after the last preserved register is
saved and before the first
preserved register is changed. (Recall, this point may not be unique.)
PROLOGUE-SIZE1 + PROLOGUE-SIZE2 gives the total size of the prologue.
3.2.2 Function Epilogue
For the purposes of exception handling, the key steps in an epilogue are:
1. Restore the registers that were preserved
2. Deallocate space on the stack that was used to preserve registers
and for other purposes
A key observation is that restoring registers is idempotent. That is, if
in the midst of restoring
registers when an exception occurs, it will do no harm if all of the
preserved registers are
restored again when unwinding. In effect, restoration of registers need
not be distinguished from
the body of a function. It is not until reaching the first instruction
that deallocates stack is
executed that the “real” epilogue begins.
3.2.2.1 RSP-Based Frame (MODE = 10 or 11)
For a RSP-based frame, whether small or large, the only unwind action
remaining after stack
deallocation is to pop the return address into the instruction pointer
(RIP). In the vast majority of
cases this means that the epilogue of the code described by an unwind
group is just a 1-byte RET
instruction that occurs at the end of the unwind group. There are other
possibilities but they are
rare and con be dealt with separately (see below).
It suffices to include a single 1-bit field, E, in the upper part of the
extended compact unwind
description:
| 63 48 47 46 40 39 32 |
|-------------------------------------------------------------------|
| RESERVED | E | PROLOGUE-SIZE2 | PROLOGUE-SIZE1 |
|-------------------------------------------------------------------|
The E flag indicates that the unwind group ends with a standard 1-byte
RET instruction.
3.2.2.2 RBP-Based Frame (MODE = 9)
The RBP-based frame is much the same as for a RSP-based frame except
that there are two
instructions that mark the transition between the two parts of the
prologue: the copy of RBP to
RSP to cut back the stack, followed by the pop of RBP to restore what
may be the callers frame
pointer. Thus the epilogue, for the purposes of unwinding, begins
immediately after the copy of
RBP to RSP. In the vast majority of cases, the immediately following two
instructions will
consist of POP RBP and RET. The POP is not idempotent because it changes
the stack pointer,
but the unwind processing code can distinguish whether the stack has
been popped or not based
on the address (3 versus 1 byte less than the highest address of the group).
It suffices to include a single 1-bit field, E, in the upper part of the
extended compact unwind
description:
| 63 48 47 46 40 39 32 |
|-------------------------------------------------------------------|
| RESERVED | E | PROLOGUE-SIZE2 | PROLOGUE-SIZE1 |
|-------------------------------------------------------------------|
The E flag indicates that the unwind group ends with a standard
3-instruction return sequence.
3.3 Special Issues
Up until this point, it might appear that each unwind group corresponds
one-for-one to the code
for a single function. While common, this is not required. Important
variations are illustrated
here.
Issue/Problem Possible Resolution
-------------------------------------------------------------------------------------------------
The first part of the prologue is too large to be Use two unwind groups:
described by the 8-bit RxP-FRAME-SIZE1 field. 1) a null frame
group of any appropriate size,
2) a suitable
RxP-based frame.
The second part of the prologue is too large to be Use three unwind
groups:
described by the 8-bit RxP-FRAME-SIZE2 field. 1) a RSP-frame
group to describe the first part of
the prologue only
(PROLOGUE-SIZE1 = the
length of the
entire group and PROLOGUE-SIZE2
= 0,
2) a RSP-frame
group to describe the second part
of the prologue
(PROLOGUE-SIZE1 & 2 both = 0
and no registers
saved),
3) a suitable
RxP-based frame group (with
PROLOGUE-SIZE1 & 2
both = 0).
There is more than one return location (without a Generally use one
unwind group for each sequence
shared epilogue sequence). of code ending with
a RET. A group following a
RET will have no
prologue (PROLOGUE-SIZE1
& 2 both = 0).
There is more than one entry point to a function Generally use one
unwind group for each sequence
(for example, a FORTRAN multiple entry point of code beginning
at an entry point. Each group
subroutine or analogous assembly language may have a normal
prologue and all but the last
equivalents) might have NO
epilogue (E flag clear).
These examples are not exhaustive, of course. But they illustrate the
flexibility of the
mechanism, which should be suitable for the vast majority of cases in
practice.
3.4 Number of Unwind Groups
A simple concatenation of unwind groups by the linker that combines
unwind group sections
from multiple modules does not, of itself, provide direct information to
the unwind software
regarding how many unwind groups are present. However, the image
information (ELF
segments) should suffice to determine this number based on the size of
the segment and the
known size of each group. Alternatively, the linker might create special
symbols that can be used
to help determine the number of groups.
This proposal leaves this detail to target-specific linker, image loader
and exception handling
system to specify.
3.5 Interaction with Code Generation
The relatively simple and fixed size nature of the extended compact
unwind information
proposed here depends on observing certain restrictions on optimization
that might affect code in
prologues and epilogues. These restrictions were noted where relevant
throughout this section.
Here is a summary of those requirements:
1. Use of any preserved register must be delayed until all of the
preserved registers have
been saved.
2. In the prologue of a function with a RBP-based frame, the two
instructions that save the
prior RBP contents and copy the stack pointer to RBP must be kept
adjacent. Similarly,
in the epilogue of a function with a RBP-based frame, the two
instructions that cut back
the stack and restore the prior RBP contents must be kept adjacent.
[This adjacency
requirement could be relaxed by introducing an additional MODE that
corresponds to having
the RBP saved in memory but not yet updated from RSP. This does not
seem worthwhile.]
3. For shrink wrap optimization in the presence of a
handler/personality routine, care must
be taken to not move instructions into the prologue that might cause
an exception.
Floating point exceptions and access violations are of particular
concern in this regard.
4 System Specific Extensions for OpenVMS
Omitted since nobody other than OpenVMS should care but it describes
adding an additional
four byte extension to the extended compact unwind group to pass along
some information from
the VAX Macro-32 compiler about VAX register usage.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: OpenPGP digital signature
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20180126/54f5c06d/attachment.sig>
More information about the llvm-dev
mailing list