<div dir="ltr"><div class="gmail_extra"><div class="gmail_quote">On Thu, Jan 5, 2017 at 8:15 PM, Peter Smith <span dir="ltr"><<a href="mailto:peter.smith@linaro.org" target="_blank">peter.smith@linaro.org</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hello Rui,<br>

<br>

Thanks for the comments<br>

<br>

- Synthetic sections and rewriting relocations<br>

I think that this would definitely be worth trying. It should remove<br>

the need for thunks to be represented in the core data structures, and<br>

would allow .<br></blockquote><div><br></div><div>Creating symbols for thunks would have another benefit: it makes disassembled output easier to read because thunks have names.</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

It would also mean that we wouldn't have to associate symbols with<br>

thunks as the relocations would directly target the thunks. ARM<br>

interworking makes reusing thunks more difficult as not every thunk is<br>

compatible with every caller. For example:<br>

ARM B target and Thumb2 B.W target can't reuse the same thunk even if<br>

in range as the branch instruction can't change state.<br>

<br>

I think it is worth an experiment to make the existing implementation<br>

of thunks use synthetic sections and rewriting relocations before<br>

trying to implement range extension thunks.<br>

<br>

- Yes the scan is linear it is essentially:<br>

do<br>

    assign addresses to input sections<br>

    for each relocation<br>

        if (thunk needed)<br>

            create thunk or reuse existing one<br>

while (no more thunks added)<br>

<br>

There's quite a lot of complexity that can be added with respect to<br>

the placement of thunks within the output section. For example if<br>

there is a caller with a low address and a caller with a high address,<br>

both might be able to reuse a thunk placed in the middle. I think it<br>

is worth starting simple though.</blockquote><div><br></div><div>I agree. I believe that computing the best thunk positions is NP-hard, but the best layout and a layout produced by a naive algorithm wouldn't be that different.</div><div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class="HOEnZb"><font color="#888888"><br>

Peter<br>

</font></span><div class="HOEnZb"><div class="h5"><br>

<br>

On 5 January 2017 at 09:52, Rui Ueyama <<a href="mailto:ruiu@google.com">ruiu@google.com</a>> wrote:<br>

> Hi Peter,<br>

><br>

> Here are my comments:<br>

><br>

> - I didn't think hard enough, but I believe creating thunks as synthetic<br>

> sections instead of attached data for other input sections is towards a<br>

> right direction, because synthetic sections are suitable for adding<br>

> linker-generated data to output files.<br>

><br>

> - As you wrote, we need to iterate relocations at least twice to create<br>

> range extension thunks. Each iteration can be a linear scan, correct? I<br>

> mean, we can start from the section at the lowest address towards higher<br>

> address examining relocations and create thunks if targets are too far.<br>

><br>

> - I do not see a reason that we need to associate range extension thunks to<br>

> symbols. It seems to me that while scanning relocations, we need to keep<br>

> only the last thunk address for each symbol. If we find that some relocation<br>

> against symbol S needs a range extension thunk, we first check if the last<br>

> thunk for S is within the range and reuse it if it is. In this way, we need<br>

> to keep only one thunk for one symbol at any moment.<br>

><br>

> - Have you considered rewriting relocations? I think, if we find that<br>

> relocation R pointing to symbol S needs a range extension thunk, we should<br>

> (1) create a range extension thunk, (2) create a symbol body object S' for<br>

> the thunk, (3) and rewrite R to point to S' instead of S. Then later passes<br>

> don't have to deal with thunks.<br>

><br>

><br>

> On Thu, Jan 5, 2017 at 3:34 AM, Peter Smith <<a href="mailto:peter.smith@linaro.org">peter.smith@linaro.org</a>> wrote:<br>

>><br>

>> I'm about to start working on range extension thunks in lld. This is<br>

>> an attempt to summarize the approach I'd like to take and what the<br>

>> impact will be on lld outside of thunks. I'm interested if anyone has<br>

>> any constraints the approach will break, alternative suggestions, or<br>

>> is working on something I'll need to take account of?<br>

>><br>

>> I expect range extension thunks to be important for ARM and useful for<br>

>> AArch64. In principle any target with a limited range branch immediate<br>

>> instruction could benefit from them.<br>

>><br>

>> I've put some detail about range extension thunks at the end of this<br>

>> message for those not familiar with them. The name range extension<br>

>> thunks is by no means universal, for example in ARM's ABI<br>

>> documentation they are called veneers, The GNU linker calls them<br>

>> stubs.<br>

>><br>

>> Summary of existing thunk implementation (ARM interworking and Mips<br>

>> PIC to non-PIC calls):<br>

>> - A Regular, Shared or Undefined symbol may have a single thunk<br>

>> - For each relocation to a symbol S, if we need a thunk we use the<br>

>> thunk for S as the target of the relocation rather than S. The thunk<br>

>> will transfer control to S.<br>

>> - Thunks are assigned to an InputSection, these are written out when<br>

>> the InputSection is written. The InputSection with the Thunk contains<br>

>> either the caller (ARM) or callee (Mips).<br>

>> - For all existing thunks, the decision of whether a thunk is needed<br>

>> is not dependent on address. A Thumb branch to ARM will always need a<br>

>> thunk, no matter the distance. Thunks can therefore be generated<br>

>> relatively early in Writer::run().<br>

>><br>

>> High level impact of range extension thunks:<br>

>><br>

>> There may be more than one than one thunk per callee:<br>

>> - A range extension thunk must be placed within range of the caller,<br>

>> there may be cases where no single thunk for a callee is in range of<br>

>> all callers.<br>

>> - An ARM Target may need a different Thunk for ARM and Thumb callers.<br>

>><br>

>> Address information is needed to determine if a range extension thunk is<br>

>> needed:<br>

>> - The more precise the address information available the less thunks<br>

>> will be generated. the most precise address information is the final<br>

>> address of caller and callee is known at thunk creation time, the<br>

>> least precise is neither the address of the caller or callee is known.<br>

>><br>

>> Range extension thunks can be combined or replace other thunks<br>

>> - Thunks may also be used for instruction set interworking (ARM) or<br>

>> for calling between position independent and non-position independent<br>

>> code (Mips). Either a chain of thunks or a combined thunk that does<br>

>> both operations is needed. For ARM all range extension thunks can<br>

>> trivially be interworking thunks as well.<br>

>><br>

>> Range extension thunk placement can be important<br>

>> - Many callers may need a range extension. Placing a range extension<br>

>> thunk so that it is in range of the most callers minimizes number of<br>

>> thunks needed.<br>

>> - Thunks may be better as synthetic sections rather than as additions<br>

>> to input sections.<br>

>><br>

>> Adding/removing content must not break the range calculations used in<br>

>> range extension thunks.<br>

>> - If any caller, callee or thunk address is changed after range<br>

>> extension thunks are calculated it could invalidate the range<br>

>> calculation.<br>

>> - Ideally range extension thunks are the last operation the linker<br>

>> does prior to resolving relocations.<br>

>><br>

>> I think that there are two separate areas to a range extension thunk<br>

>> implementation that can be considered separately.<br>

>> 1.) Moving thunk generation to a later stage, at a minimum we need an<br>

>> estimation of the address of each caller and callee, in an ideal world<br>

>> we know the final address of each caller and callee. This could mean<br>

>> assigning section addresses multiple times.<br>

>> 2.) The alterations to the core data structures to permit more than<br>

>> one Thunk per symbol and the logic to select the "right" Thunk for<br>

>> each relocation.<br>

>><br>

>> The design I'd like to aim at moves thunk creation into<br>

>> finalizeSections() at a point where the sizes and addresses of all the<br>

>> SyntheticSections are known. This would mean that the final address of<br>

>> each caller and callee could be used, and after thunk creation there<br>

>> would be no further content changes. This would mean:<br>

>> - All code that runs prior to thunk creation may have the offset in<br>

>> the OutputSection altered by the addition of thunks. In particular<br>

>> scanRelocs() calculates the offset in the OutputSection of some<br>

>> relocations. We would need to find alternative ways of handling these<br>

>> cases so that they could either survive thunk creation or be patched<br>

>> up afterwards.<br>

>> - assignAddresses() will need to run at least twice if thunks are<br>

>> created. At least once to give the thunk creation the caller and<br>

>> callee addresses, and at least once after all thunks have been<br>

>> created.<br>

>><br>

>> There is an alternative design that only uses estimates of caller and<br>

>> callee address to decide if a thunk is needed. In effect we use a<br>

>> heuristic to predict how much extra synthetic content, such as plt and<br>

>> got size, will be added after Thunk creation when deciding if a Thunk<br>

>> is needed. I'm not in favour of this approach as from bitter<br>

>> experience it tends to result in hard to debug problems when the<br>

>> heuristics break down. Precise addresses would also allow errata<br>

>> patching thunks [*]<br>

>><br>

>> I've not thought too hard about how to alter the core data structures<br>

>> yet. I think this will mostly be implementation detail though.<br>

>><br>

>> Next steps:<br>

>> I'd like to proceed with the following plan:<br>

>> 1.) Move the existing thunk implementation to where it would need to<br>

>> be in finalizeSections(). This should flush out all the non-thunk<br>

>> related assumptions about addresses without adding any existing<br>

>> complexity to the Thunk implementation.<br>

>> 2.) Add support for multiple thunks per symbol<br>

>> 3.) If it turns out to be a good idea, implement thunks as<br>

>> SyntheticSections<br>

>> 4.) Add support for range extensions.<br>

>><br>

>> I think the first implementation of range extension thunks should be<br>

>> simple and not try too hard to minimize the number of thunks needed.<br>

>> If there is a need to optimize it can be done later as the changes<br>

>> should be within the thunk creation module.<br>

>><br>

>> Thanks for reading<br>

>><br>

>> Peter<br>

>><br>

>> The remainder of the message is a brief explanation of range extension<br>

>> and errata patching thunks.<br>

>><br>

>> What are range extension thunks?<br>

>> Many architectures have branch instructions that have a finite range<br>

>> that is insufficient to reach all possible program locations. For<br>

>> example the ARM branch immediate instruction has an immediate that<br>

>> encodes an offset of +-32Mb from the branch instruction. A range<br>

>> extension thunk is a linker generated code sequence, inserted between<br>

>> the caller and the callee, that completes the transfer of control to<br>

>> the callee when the distance between the caller and callee exceeds the<br>

>> range of the branch instruction. A simple example in psuedo assembly<br>

>> for a non-position independent ARM function call.<br>

>><br>

>> source:<br>

>> BL long_range_thunk_to_target<br>

>> ...<br>

>> long_range_thunk_to_target<br>

>> LDR r12, target_address ; r12 is the corruptible interprocedural<br>

>> scratch register (ip)<br>

>> BX r12<br>

>> target_address:<br>

>> .word target ;<br>

>> ...<br>

>> target:<br>

>> ...<br>

>><br>

>> What is an errata patching thunk?<br>

>> Some CPU errata (hardware bugs) can be fixed at a link time by<br>

>> replacing an instruction with a branch to a sequence of equivalent<br>

>> instructions that are guaranteed to to not trigger the erratum. In<br>

>> some cases the trigger sequence is dependent on precise addresses such<br>

>> as immediates crossing page boundaries, for example<br>

>> <a href="https://sourceware.org/ml/binutils-cvs/2015-04/msg00012.html" rel="noreferrer" target="_blank">https://sourceware.org/ml/<wbr>binutils-cvs/2015-04/msg00012.<wbr>html</a> . Errata<br>

>> patching is out of the scope of implementing range extension thunks<br>

>> but can be seen as a generalization of it.<br>

><br>

><br>

</div></div></blockquote></div><br></div></div>