<div dir="ltr"><div><div><div><div>Hi Nick,<br><br></div>It is a pleasure to be in contact with the creator of the compact unwind approach!<br><br></div>I can see how an array of 32-bit unwind blocks could be used to describe each distinct point within a function (within a prolog in particular). But then you end up with six or seven or more such blocks for a large percentage of functions, don't you? Seems like a lot of additional space for something that is usually simple, compact (sic) and idiomatic and dilutes the original benefit. We are seeking a summary description in a small fixed size to supplement the base unwind block.<br><br></div><div>Since you mention it, why have the 

UNWIND_IS_NOT_FUNCTION_START


flag? Seems like you don't need it for unwinding purposes. Especially in a function that has a code layout where the entry point is not the first/lowest addressed instruction of the function (which I have seen in some GEM produced code for Alpha). An aside, but just curious.<br></div><div><br></div>Not being a MAC guy the macOS two-level lookup scheme has always been rather a mystery. What little I know is gleaned from the compact_unwind_encoding.h file found around the net. But I have never seen the .c/.cxx/.cpp that goes with it to better understand how the mapping works. Always figured the was considered proprietary but maybe I just don't know where/how to look. Is there public code and/or design note information to look at?<br><br></div>You are correct that this paper tends toward trying to create a scheme with little or no linker involvement. There are good reasons to ask more of the linker for better space economy, so this is still under discussion. <br><div><div><div><br></div><div>You are also correct that the unwinder needs to interpret the additional prolog/epilog information. But the rest of your comment has me confused: the extra 2-bytes is just a widening as you suggest, right? And there is no Section 5.1--did you mean 3.1?<br><br></div><div>Thanks again,<br></div><div>Ron<br></div></div></div></div><div class="gmail_extra"><br><div class="gmail_quote">On Fri, Jan 26, 2018 at 9:54 PM, Nick Kledzik via llvm-dev <span dir="ltr"><<a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">John and Ron,<br>

<br>

I developed the original compact unwind implementation for macOS 10.6 back in 2009.  I tried to leave space in the design to support finer grain exception handling such as for asynchronous or for the shrink wrap optimization.   The idea I had at the time was  instead of having just one 32-bit compact unwind info per function, there could be an array of them each covering a different range. All but the first would have the UNWIND_IS_NOT_FUNCTION_START bit set.<br>

<br>

On macOS, the linker takes the 32-byte “compact unwind group description” records and folds them down into a complete different (read only) runtime table which basically maps an offset in the DSO into a 32-bit compact info for that address.<br>

<br>

It seems like from your proposal that your linker keeps the “compact unwind group description” records as is and just concatenates them, and the runtime would need to understand the prolog/epilog encoding.  I would think that just widening the group record (as you suggest in 5.1) would be much simpler (like SHT_REL was widened to SHT_RELA).<br>

<br>

-Nick<br>

<div class="HOEnZb"><div class="h5"><br>

<br>

<br>

> On Jan 26, 2018, at 7:54 AM, John Reagan via llvm-dev <<a href="mailto:llvm-dev@lists.llvm.org">llvm-dev@lists.llvm.org</a>> wrote:<br>

><br>

> Here is our proposal to extend/enhance the x86-64 compact unwind<br>

> descriptors to fully describe the prologue/epilogue for asynchronous<br>

> unwinding.  I believe there are missing/lacking CFI directives as well,<br>

> but I'll save that for another thread.<br>

><br>

><br>

> Asynchronous Compact Unwind Descriptors<br>

> Ron Brender, VMS Software, Inc.<br>

> Revised January 25, 2018<br>

><br>

> 1  Introduction<br>

> This document proposes means to extend so-called compact unwind<br>

> descriptors to support fully<br>

> asynchronous exception handling. This will make extended compact unwind<br>

> descriptors an<br>

> alternative to DWARF CFI (call frame information) for achieving<br>

> asynchronous exception<br>

> support.<br>

><br>

> Compact unwind descriptors can and have been used in both 64- and 32-bit<br>

> environments.<br>

> However, this proposal addresses only 64-bit environments. While the<br>

> ideas presented here can<br>

> be readily adapted for use in a 32-bit environment, for simplicity this<br>

> document makes no<br>

> attempt to do so.<br>

><br>

> There are generally three kinds of information that together form the<br>

> heart of modern software<br>

> exception handling systems:<br>

><br>

>  1. Information that is used to divide the remaining unwind information<br>

> into groups that are<br>

>     specific to particular regions of memory (often, but not<br>

> necessarily, associated with a<br>

>     single function) as well as provide a way to efficiently search for<br>

> and identify the<br>

>     grouping that is associated with a particular address in memory.<br>

><br>

>  2. Information that can be used to virtually or actually unwind from<br>

> the call frame of an<br>

>     executing function to the call frame of its caller (at the point of<br>

> the call).<br>

><br>

>  3. Identification of an associated personality routine that is invoked<br>

> by the general exception<br>

>     handling mechanization to guide how processing should proceed for a<br>

> given function, as<br>

>     well as additional “language specific data” needed for the<br>

> personality routine to do its<br>

>     job. Note that the personality routine and its related data are<br>

> specified as an adjunct to<br>

>     compiled code and totally opaque to the general mechanism (other<br>

> than the specified<br>

>     interface).<br>

><br>

> Note in particular that C++ exception handling is built on top of a<br>

> personality routine and<br>

> language specific data area ABI that itself can be implemented using<br>

> either DWARE CFI or<br>

> extended compact unwind information as described here. The choice<br>

> between the two is<br>

> transparent to C++.<br>

><br>

><br>

> 2  Compact Unwind Overview<br>

> This section provides a brief overview of key features of the LLVM<br>

> compact unwind design. It<br>

> does not attempt a comprehensive re-statement of all aspects of the<br>

> design except to the extent<br>

> necessary to motivate and understand later proposed changes and<br>

> enhancements.<br>

><br>

> 2.1  Compact Unwind Group Description<br>

> A compact unwind group consist of five fields, as follows:<br>

><br>

> | 63                             32 31                            0 |<br>

> |-----------------------------<wbr>------------------------------<wbr>--------|<br>

> | STARTING-ADDRESS                                                  |<br>

> | LENGTH                           | COMPACT-UNWIND-DESCRIPTION     |<br>

> | PERSONALITY-FUNCTION-POINTER                                      |<br>

> | LANGUAGE-SPECIFIC-DATA-ADDRESS                                    |<br>

> |-----------------------------<wbr>------------------------------<wbr>--------|<br>

><br>

> STARTING-ADDRESS (64-bits) is the lowest address of a region of memory<br>

> occupied by some<br>

> code, typically the entry point of a function.<br>

><br>

> LENGTH (32-bits) is the number of bytes included in this group,<br>

> typically including all and only<br>

> the code of a function.<br>

><br>

> COMPACT-UNWIND-DESCRIPTION (32-bits) is a description of the fully<br>

> formed frame of a<br>

> function and how to unwind it. This is described further following.<br>

><br>

> PERSONALITY-FUNCTION-POINTER (64-bit) is a pointer to the personality<br>

> routine.<br>

><br>

> LANGUAGE-SPECIFIC-DATA-ADDRESS (64-bits, sometimes abbreviated LSDA) is a<br>

> pointer to some data to be passed to the personality routine when it is<br>

> called.<br>

><br>

> A key observation is that the starting address plus length way of<br>

> describing a group means that<br>

> the set of groups for a compilation unit need not describe all of the<br>

> code in that unit. In<br>

> particular, it appears to be expected that no unwind information need be<br>

> generated for leaf<br>

> functions.<br>

><br>

> On the other hand, it is reasonable to expect that the groups that are<br>

> emitted are ordered by the<br>

> starting address. This means that a simple and fast binary search can be<br>

> used to map an address<br>

> to the group that applies to that address, if any.<br>

><br>

> It is useful to note that the run-time representation of unwind<br>

> information can vary from little<br>

> more than a simple concatenation of the compile-time information to a<br>

> substantial rewriting of<br>

> unwind information by the linker. The proposal favors simple<br>

> concatenation while maintaining<br>

> the same ordering of groups as their associated code.<br>

><br>

> 2.2  Compact Unwind Frame Description<br>

> A compact unwind frame description describes a frame in sufficient<br>

> detail to be able to unwind<br>

> that frame to the frame of its caller.<br>

><br>

> | 31    28 27    24 23                                            0 |<br>

> |-----------------------------<wbr>------------------------------<wbr>--------|<br>

> | FLAGS   | MODE   |                                                |<br>

> |-----------------------------<wbr>------------------------------<wbr>--------|<br>

><br>

> At the top most level, there are four bits that are not of further<br>

> interest here. Interpretation of<br>

> these bits is neither used nor changed.<br>

><br>

> Also at the top-level is a 4-bit mode field. This is the tag of a<br>

> discriminated (tagged, variant)<br>

> union that selects the interpretation of the remaining 24 bits.<br>

><br>

> Of the 16 possible modes, only 5 are defined :<br>

><br>

> Code    Meaning      Description<br>

> 0       Old          “Old” is presumed to refer to some historical<br>

> design that is no longer of interest.<br>

>                      It is treated here as Reserved.<br>

> 1       RBP-based    The frame uses the RBP register as a frame pointer.<br>

> The size of the frame can<br>

>         frame        vary during execution.<br>

> 2       RSP-based    The frame uses RSP as the frame pointer. The size<br>

> of the frame is fixed (at<br>

>         frame        compilation time).<br>

> 3       Large RSP-   The frame uses RSP as the frame pointer, The size<br>

> of the frame is fixed (at<br>

>         based frame  compilation time); however, that size is too large<br>

> to express within this 32-bit<br>

>                      descriptor encoding.<br>

> 4       DWARF        The frame, for whatever reason, cannot be<br>

> adequately described using the<br>

>         escape       compact unwind frame description. The remaining<br>

> 24-bits are an index into<br>

>                      what the DWARF standard calls the .debug_frame<br>

> section (__eh_frame in<br>

>                      LLVM).<br>

><br>

> 2.2.1  RBP-based Frame (MODE=1)<br>

> For a RBP-based frame, the remaining 24 bits are encoded as follows:<br>

><br>

> | 23        16 | 15 | 14                                          0 |<br>

> |-----------------------------<wbr>------------------------------<wbr>--------|<br>

> | OFFSET       | 0  | REGS                                          |<br>

> |-----------------------------<wbr>------------------------------<wbr>--------|<br>

><br>

> In a RBP-based frame the RBP register is pushed on the stack immediately<br>

> after the return<br>

> address, then RSP is moved to RBP. To unwind, RSP is restored with the<br>

> current RPB value,<br>

> then RBP is restored by popping off the stack, and the return is done by<br>

> popping the stack once<br>

> more into the instruction pointer.<br>

><br>

> All preserved registers are saved in a small range in the stack that<br>

> starts at RBP-8 to RBP-2040.<br>

> The offset/8 relative to RBP is encoded in the 8-bit OFFSET field. The<br>

> registers saved are<br>

> encoded in the 15-bit REGS field as five 3-bit entries.<br>

><br>

> 2.2.2  RSP-Based Frame (MODE=2)<br>

> For a RSP-based frame, the remaining 24 bits are encoded as follows:<br>

><br>

> |23        16 | 15      13 | 12      10 | 9                       0 |<br>

> |-----------------------------<wbr>------------------------------<wbr>--------|<br>

> | SIZE        |            | CNT        | REG_PERM                  |<br>

> |-----------------------------<wbr>------------------------------<wbr>--------|<br>

><br>

> In a RSP-based frame the stack pointer serves directly as the frame<br>

> pointer and RBP is available<br>

> for use as a general register. Upon entry, the stack pointer is<br>

> decremented by 8*SIZE bytes (the<br>

> maximum stack allocation is thus 2040 bytes). To unwind, the stack size<br>

> is added to the stack<br>

> pointer, and completed by popping the stack once more into the<br>

> instruction pointer.<br>

><br>

> All preserved registers are saved on the stack immediately after the<br>

> return address.  The number<br>

> of registers saved (up to 6) is encoded in the 3-bit CNT field. The<br>

> 11-bit REG_PERM field<br>

> encodes which registers were saved and in what order.<br>

><br>

> 2.2.3  Large RSP-Based Frame (MODE=3)<br>

> For a large RSP-based frame, the remaining 24 bits are encoded as follows:<br>

><br>

> |23        16 | 15      13 | 12      10 | 9                       0 |<br>

> |-----------------------------<wbr>------------------------------<wbr>--------|<br>

> | SIZE        | ADJ        | CNT        | REG_PERM                  |<br>

> |-----------------------------<wbr>------------------------------<wbr>--------|<br>

><br>

> This case is like the previous, except the stack size is too large to<br>

> encode in the compact unwind<br>

> encoding.  Instead, the function must include a "subq $nnnnnnnn, RSP"<br>

> instruction in its<br>

> prologue to allocate the stack.  The offset from the entry point of the<br>

> function to the nnnnnnnn<br>

> value in the function is given in the SIZE field.<br>

><br>

> Depending on the exact instructions used to save registers (PUSH versus<br>

> MOV), the nnnnnnnn<br>

> value in the instruction stream may not be quite the full stack size.<br>

> ADJ * 8 is the additional<br>

> adjustment needed to get the actual size.<br>

><br>

> 2.2.4  DWARF Escape (MODE=4)<br>

> The frame, for whatever reason, cannot be adequately described using a<br>

> compact unwind frame<br>

> description. The remaining 24-bits are an index into what the DWARF<br>

> standard calls the<br>

> .debug_frame section (called __eh_frame in LLVM).<br>

><br>

><br>

> 3  Asynchronous Changes and Enhancements<br>

> It is immediately obvious that omission of unwind information for leaf<br>

> functions (with any kind<br>

> of frame) precludes handling an exception that might occur during its<br>

> execution. It follows that<br>

> unwind information must cover all of the code of a module (with one<br>

> exception discussed<br>

> below). But if successive unwind groups are ordered (as previously<br>

> assumed) and also leave no<br>

> gaps, then the LENGTH field is redundant and can be omitted. The<br>

> beginning address of a<br>

> following group is always one byte past the end of the predecessor<br>

> group. There remains only the<br>

> question of how to identify the last group of a set.<br>

><br>

> It should also be clear that the unwind representation described in the<br>

> prior section is not<br>

> sufficient to unwind from an asynchronous exception that might occur in<br>

> either the prologue or<br>

> epilogue of a function. To see this consider what would happen if an<br>

> exception occurred at either<br>

> the entry point or the return instruction of either a RBP- or RSP-frame<br>

> function. To be able to<br>

> handle asynchronous exceptions at any point during function execution,<br>

> it is necessary to add<br>

> additional information to each unwind group.<br>

><br>

> These two considerations can be combined. The result is simply to<br>

> repurpose the LENGTH field<br>

> to encode prologue and epilogue information.<br>

><br>

> 3.1  Extended MODEs<br>

> To preserve backward compatibility and to allow intermixing of<br>

> traditional and extended<br>

> compact unwind groups, new MODEs are defined as follows:<br>

><br>

> Code        Meaning                   Description<br>

> ------------------------------<wbr>------------------------------<wbr>------------------------------<wbr>----<br>

> 8           Null frame                See below.<br>

> 9           Extended RBP-based frame  Like mode 1 but combined with<br>

> extended prologue/epilogue<br>

>                                       information in place of a LENGTH field<br>

> 10          Extended RSP-based frame  Like mode 2 but combined with<br>

> extended prologue/epilogue<br>

>                                       information in place of a LENGTH field<br>

> 11          Extended Large RSP-based  Like mode 3 but combined with<br>

> extended prologue/epilogue<br>

>             frame                     information in place of a LENGTH field<br>

> 12          Extended parameter(s)     See below.<br>

><br>

> A null frame (MODE = 8) is the simplest possible frame, with no<br>

> allocated stack of either kind<br>

> (hence no saved registers).  A null frame might be considered just a<br>

> special case of a RSP-based<br>

> frame with a stack size of zero. But unlike other frames, its frame<br>

> pointer is usually not 16-byte<br>

> aligned.<br>

><br>

> It appears technically feasible for a null frame function to have a<br>

> personality routine. However,<br>

> the utility of such a capability seems too meager to justify allowing<br>

> this. We propose to not<br>

> support this.<br>

><br>

> There remains the question of whether it is necessary or at least<br>

> desirable to strictly apply the<br>

> requirement that “all” code be covered by an unwind group. Based on<br>

> successful experience in<br>

> OpenVMS on the Itanium architecture (as well as Windows systems we<br>

> think), this alternative is<br>

> proposed:<br>

><br>

>     If the first attempt to lookup an unwind group for an exception<br>

> address fails, then it is<br>

>     (tentatively) assumed to have occurred within a null frame function<br>

> or in a part of a function<br>

>     that is adequately described by a null frame. The presumed return<br>

> address is (virtually or<br>

>     actually) popped from the top of stack and looked up. This second<br>

> attempted lookup must<br>

>     succeed, in which case processing continues normally. A failure is a<br>

> fatal error.<br>

><br>

> This concept of a “default null frame group” is especially convenient<br>

> for dealing with<br>

> disconnected code sequences such as trampolines, PLTs, and the like.<br>

><br>

> The extended RBP-based and RSP-based frame modes (MODEs = 9 and 10) are<br>

> simple<br>

> supersets of their more traditional predecessors.<br>

><br>

> The large RSP-based frame mode (MODE = 11) is also a superset of mode 3<br>

> except that instead<br>

> of finding the stack size in the instruction stream, it is found in a<br>

> preceding extended parameter<br>

> unwind group (MODE = 12). This difference is essential for support of<br>

> execute-only code.<br>

><br>

> To understand the extended parameter group, suppose that two groups<br>

> occur one after the other<br>

> but have the same STARTING-ADDRESS value. A binary search using the<br>

> STARTING-<br>

> ADDRESS field will ignore the first of the pair because the address<br>

> being sought can never be at<br>

> least as large as the first but less than the second. Such a group can<br>

> be used as an escape<br>

> convention that allows inserting additional information into the<br>

> sequence of groups that<br>

> otherwise cannot be easily included.<br>

><br>

> The extended modes are described more fully in the following sections.<br>

><br>

> 3.2  Function Parts<br>

> Functions are generally considered to consist of three distinct parts:<br>

><br>

> 1. A prologue, which allocates local storage, saves registers that must<br>

> be preserved for the<br>

>    benefit of callers, and possibly other housekeeping (such as<br>

> initializing any local state<br>

>    necessary for the correct management of the function’s execution).<br>

><br>

> 2. A body, which performs the main work of the function.<br>

><br>

> 3. An epilogue, which makes the result available to the caller, and<br>

> undoes the effects of the<br>

>    prologue (restores preserved registers, deallocates local storage,<br>

> and so on).<br>

><br>

> Here a word of caution is in order. This vocabulary tends to be applied<br>

> at multiple levels in<br>

> software architecture and each level has its own set of issues to<br>

> consider and resulting<br>

> requirements to be observed.<br>

><br>

> Note that a null frame function has no distinct prologue, body or<br>

> epilogue. Every instruction can<br>

> be viewed as simultaneously in all three parts, or in none of them.<br>

><br>

> This leads to the following proposal for the upper part of the extended<br>

> compact unwind<br>

> quadword for use in combination with MODE = 8 in the lower part.<br>

><br>

> | 63                            48 | 47                          32 |<br>

> |-----------------------------<wbr>------------------------------<wbr>--------|<br>

> | RESERVED                         | 0            ...             0 |<br>

> |-----------------------------<wbr>------------------------------<wbr>--------|<br>

><br>

><br>

> | 31                            16 | 15                           0 |<br>

> |-----------------------------<wbr>------------------------------<wbr>--------|<br>

> | 0 0 0 0 | 1 0 0 0 | 0   ...    0 | 0            ...             0 |<br>

> |-----------------------------<wbr>------------------------------<wbr>--------|<br>

><br>

><br>

><br>

> 3.2.1  Function Prologue<br>

> For the purposes of exception handling, the key steps are:<br>

><br>

>  1. Allocate space on the stack in which to save preserved registers and<br>

> for other purposes<br>

>  2. Save the registers that must be preserved.<br>

><br>

> The DWARF call frame information (CFI) provides a description that is<br>

> precise and accurate at<br>

> each and every instruction in a function (not limited to just the<br>

> prologue). But experience shows<br>

> that this representation is bulky in space and expensive in time to<br>

> decode and interpret. Indeed,<br>

> compact unwind descriptors as described in Section 2 were created to<br>

> avoid these issues.<br>

><br>

> Experience with OpenVMS on the Alpha and Itanium architectures shows<br>

> that a key constraint<br>

> can greatly simplify the unwind description: require that no preserved<br>

> register can be changed<br>

> until all of them have been saved. The last such save ends the prologue<br>

> and the next instruction<br>

> begins the body. An unwind does not need (must not) restore any<br>

> preserved registers because<br>

> they are still valid. A simple length from the entry point suffices to<br>

> describe this boundary.<br>

> [The body does not necessarily need to begin until at least one of the<br>

> preserved registers is<br>

> actually changed.]<br>

><br>

> 3.2.1.1  RSP-Based Frame (MODE = 10)<br>

> For a RSP-based frame, it is also necessary to know which instruction<br>

> adjusts the stack pointer:<br>

> before this instruction an unwind must only (virtually or actually) pop<br>

> the return address; and<br>

> after this instruction, the stack pointer must first be incremented to<br>

> deallocate the stack frame,<br>

> then the instruction pointer restored.<br>

><br>

> This leads to the following proposal for the upper part of the extended<br>

> compact unwind<br>

> quadword for use in combination with MODE = 10 in the lower part.<br>

><br>

> | 63                       48 47 46             40 39            32 |<br>

> |-----------------------------<wbr>------------------------------<wbr>--------|<br>

> | RESERVED                   |  |  PROLOGUE-SIZE2 |  PROLOGUE-SIZE1 |<br>

> |-----------------------------<wbr>------------------------------<wbr>--------|<br>

><br>

><br>

> Except for the MODE, the lower part is exactly like it would be for MODE<br>

> = 2.<br>

><br>

> PROLOGUE-SIZE1 is the length in bytes of the first part of the prologue<br>

> relative to the<br>

> STARTING-ADDRESS, up to and including the instruction that allocates the<br>

> stack.<br>

><br>

> PROLOGUE-SIZE2 is the length in bytes of the second part of the prologue<br>

> relative to the end<br>

> of the first part, up to a point after the last preserved register is<br>

> saved and before the first<br>

> preserved register is changed. (Recall, this point may not be unique.)<br>

><br>

> PROLOGUE-SIZE1 + PROLOGUE-SIZE2 gives the total size of the prologue.<br>

><br>

> The maximum prologue size allowed here is much greater than will be<br>

> typical. This is deliberate<br>

> to allow compilers freedom to use shrink-wrap type optimizations in<br>

> which safe operations that<br>

> need no body state are moved from the body to the beginning of the<br>

> prologue. This avoids the<br>

> cost of stack frame setup and teardown for simple special case handling<br>

> that often leads to an<br>

> early exit.<br>

><br>

> 3.2.1.2  Large RSP-based Frame (MODE = 11)<br>

> This case is like the previous, except the stack size is too large to<br>

> encode in the compact unwind<br>

> encoding.  The SIZE field is interpreted as an offset relative to the<br>

> beginning of the containing<br>

> unwind group to (what is necessarily) an earlier extended parameter group.<br>

><br>

> 3.2.1.3  RBP-Based Frame (MODE = 9)<br>

> The RBP-based frame is the same as for a RSP-based frame except that<br>

> there are two instructions<br>

> that mark the transition between the two parts of the prologue: the push<br>

> of the prior RBP value<br>

> onto the stack followed by the copy of RSP to RBP to establish the new<br>

> frame pointer.<br>

><br>

> There appears to be no reason to not make this simplifying requirement:<br>

> the push and copy<br>

> always occur together.<br>

><br>

> This leads to the following proposal for the upper part of the extended<br>

> compact unwind<br>

> quadword for use in combination with MODE = 9 in the lower part.<br>

><br>

> | 63                       48 47 46             40 39            32 |<br>

> |-----------------------------<wbr>------------------------------<wbr>--------|<br>

> | RESERVED                   |  |  PROLOGUE-SIZE2 |  PROLOGUE-SIZE1 |<br>

> |-----------------------------<wbr>------------------------------<wbr>--------|<br>

><br>

> Except for the MODE, the lower part is exactly like it would be for MODE<br>

> = 1.<br>

><br>

> PROLOGUE-SIZE1 is the length in bytes of the first part of the prologue<br>

> relative to the<br>

> STARTING-ADDRESS, up to and including the instruction that pushes the<br>

> prior RBP contents.<br>

><br>

> PROLOGUE-SIZE2 is the length in bytes of the second part of the prologue<br>

> relative to the end<br>

> of the first part, up to a point after the last preserved register is<br>

> saved and before the first<br>

> preserved register is changed. (Recall, this point may not be unique.)<br>

><br>

> PROLOGUE-SIZE1 + PROLOGUE-SIZE2 gives the total size of the prologue.<br>

><br>

> 3.2.2  Function Epilogue<br>

> For the purposes of exception handling, the key steps in an epilogue are:<br>

><br>

>  1. Restore the registers that were preserved<br>

>  2. Deallocate space on the stack that was used to preserve registers<br>

> and for other purposes<br>

><br>

> A key observation is that restoring registers is idempotent. That is, if<br>

> in the midst of restoring<br>

> registers when an exception occurs, it will do no harm if all of the<br>

> preserved registers are<br>

> restored again when unwinding. In effect, restoration of registers need<br>

> not be distinguished from<br>

> the body of a function. It is not until reaching the first instruction<br>

> that deallocates stack is<br>

> executed that the “real” epilogue begins.<br>

><br>

> 3.2.2.1  RSP-Based Frame (MODE = 10 or 11)<br>

> For a RSP-based frame, whether small or large, the only unwind action<br>

> remaining after stack<br>

> deallocation is to pop the return address into the instruction pointer<br>

> (RIP). In the vast majority of<br>

> cases this means that the epilogue of the code described by an unwind<br>

> group is just a 1-byte RET<br>

> instruction that occurs at the end of the unwind group. There are other<br>

> possibilities but they are<br>

> rare and con be dealt with separately (see below).<br>

><br>

> It suffices to include a single 1-bit field, E, in the upper part of the<br>

> extended compact unwind<br>

> description:<br>

><br>

> | 63                       48 47 46             40 39            32 |<br>

> |-----------------------------<wbr>------------------------------<wbr>--------|<br>

> | RESERVED                  | E |  PROLOGUE-SIZE2 |  PROLOGUE-SIZE1 |<br>

> |-----------------------------<wbr>------------------------------<wbr>--------|<br>

><br>

><br>

> The E flag indicates that the unwind group ends with a standard 1-byte<br>

> RET instruction.<br>

><br>

> 3.2.2.2  RBP-Based Frame (MODE = 9)<br>

> The RBP-based frame is much the same as for a RSP-based frame except<br>

> that there are two<br>

> instructions that mark the transition between the two parts of the<br>

> prologue: the copy of RBP to<br>

> RSP to cut back the stack, followed by the pop of RBP to restore what<br>

> may be the callers frame<br>

> pointer. Thus the epilogue, for the purposes of unwinding, begins<br>

> immediately after the copy of<br>

> RBP to RSP. In the vast majority of cases, the immediately following two<br>

> instructions will<br>

> consist of POP RBP and RET. The POP is not idempotent because it changes<br>

> the stack pointer,<br>

> but the unwind processing code can distinguish whether the stack has<br>

> been popped or not based<br>

> on the address (3 versus 1 byte less than the highest address of the group).<br>

><br>

> It suffices to include a single 1-bit field, E, in the upper part of the<br>

> extended compact unwind<br>

> description:<br>

><br>

> | 63                       48 47 46             40 39            32 |<br>

> |-----------------------------<wbr>------------------------------<wbr>--------|<br>

> | RESERVED                  | E |  PROLOGUE-SIZE2 |  PROLOGUE-SIZE1 |<br>

> |-----------------------------<wbr>------------------------------<wbr>--------|<br>

><br>

> The E flag indicates that the unwind group ends with a standard<br>

> 3-instruction return sequence.<br>

><br>

> 3.3  Special Issues<br>

> Up until this point, it might appear that each unwind group corresponds<br>

> one-for-one to the code<br>

> for a single function. While common, this is not required. Important<br>

> variations are illustrated<br>

> here.<br>

><br>

> Issue/Problem                                        Possible Resolution<br>

> ------------------------------<wbr>------------------------------<wbr>------------------------------<wbr>-------<br>

> The first part of the prologue is too large to be    Use two unwind groups:<br>

> described by the 8-bit RxP-FRAME-SIZE1 field.        1) a null frame<br>

> group of any appropriate size,<br>

>                                                      2) a suitable<br>

> RxP-based frame.<br>

><br>

> The second part of the prologue is too large to be   Use three unwind<br>

> groups:<br>

> described by the 8-bit RxP-FRAME-SIZE2 field.        1) a RSP-frame<br>

> group to describe the first part of<br>

>                                                      the prologue only<br>

> (PROLOGUE-SIZE1 = the<br>

>                                                      length of the<br>

> entire group and PROLOGUE-SIZE2<br>

>                                                      = 0,<br>

>                                                      2) a RSP-frame<br>

> group to describe the second part<br>

>                                                      of the prologue<br>

> (PROLOGUE-SIZE1 & 2 both = 0<br>

>                                                      and no registers<br>

> saved),<br>

>                                                      3) a suitable<br>

> RxP-based frame group (with<br>

>                                                      PROLOGUE-SIZE1 & 2<br>

> both = 0).<br>

><br>

> There is more than one return location (without a    Generally use one<br>

> unwind group for each sequence<br>

> shared epilogue sequence).                           of code ending with<br>

> a RET. A group following a<br>

>                                                      RET will have no<br>

> prologue (PROLOGUE-SIZE1<br>

>                                                      & 2 both = 0).<br>

><br>

> There is more than one entry point to a function     Generally use one<br>

> unwind group for each sequence<br>

> (for example, a FORTRAN multiple entry point         of code beginning<br>

> at an entry point. Each group<br>

> subroutine or analogous assembly language            may have a normal<br>

> prologue and all but the last<br>

> equivalents)                                         might have NO<br>

> epilogue (E flag clear).<br>

><br>

><br>

> These examples are not exhaustive, of course. But they illustrate the<br>

> flexibility of the<br>

> mechanism, which should be suitable for the vast majority of cases in<br>

> practice.<br>

><br>

> 3.4  Number of Unwind Groups<br>

> A simple concatenation of unwind groups by the linker that combines<br>

> unwind group sections<br>

> from multiple modules does not, of itself, provide direct information to<br>

> the unwind software<br>

> regarding how many unwind groups are present. However, the image<br>

> information (ELF<br>

> segments) should suffice to determine this number based on the size of<br>

> the segment and the<br>

> known size of each group. Alternatively, the linker might create special<br>

> symbols that can be used<br>

> to help determine the number of groups.<br>

><br>

> This proposal leaves this detail to target-specific linker, image loader<br>

> and exception handling<br>

> system to specify.<br>

><br>

> 3.5  Interaction with Code Generation<br>

> The relatively simple and fixed size nature of the extended compact<br>

> unwind information<br>

> proposed here depends on observing certain restrictions on optimization<br>

> that might affect code in<br>

> prologues and epilogues. These restrictions were noted where relevant<br>

> throughout this section.<br>

> Here is a summary of those requirements:<br>

><br>

>  1. Use of any preserved register must be delayed until all of the<br>

> preserved registers have<br>

>     been saved.<br>

>  2. In the prologue of a function with a RBP-based frame, the two<br>

> instructions that save the<br>

>     prior RBP contents and copy the stack pointer to RBP must be kept<br>

> adjacent. Similarly,<br>

>     in the epilogue of a function with a RBP-based frame, the two<br>

> instructions that cut back<br>

>     the stack and restore the prior RBP contents must be kept adjacent.<br>

> [This adjacency<br>

>     requirement could be relaxed by introducing an additional MODE that<br>

> corresponds to having<br>

>     the RBP saved in memory but not yet updated from RSP. This does not<br>

> seem worthwhile.]<br>

>  3. For shrink wrap optimization in the presence of a<br>

> handler/personality routine, care must<br>

>     be taken to not move instructions into the prologue that might cause<br>

> an exception.<br>

>     Floating point exceptions and access violations are of particular<br>

> concern in this regard.<br>

><br>

> 4  System Specific Extensions for OpenVMS<br>

> Omitted since nobody other than OpenVMS should care but it describes<br>

> adding an additional<br>

> four byte extension to the extended compact unwind group to pass along<br>

> some information from<br>

> the VAX Macro-32 compiler about VAX register usage.<br>

><br>

><br>

><br>

> ______________________________<wbr>_________________<br>

> LLVM Developers mailing list<br>

> <a href="mailto:llvm-dev@lists.llvm.org">llvm-dev@lists.llvm.org</a><br>

> <a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" rel="noreferrer" target="_blank">http://lists.llvm.org/cgi-bin/<wbr>mailman/listinfo/llvm-dev</a><br>

<br>

______________________________<wbr>_________________<br>

LLVM Developers mailing list<br>

<a href="mailto:llvm-dev@lists.llvm.org">llvm-dev@lists.llvm.org</a><br>

<a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" rel="noreferrer" target="_blank">http://lists.llvm.org/cgi-bin/<wbr>mailman/listinfo/llvm-dev</a><br>

</div></div></blockquote></div><br></div>