[llvm-dev] RFC: LLD range extension thunks

Thu Jan 5 19:21:50 PST 2017

On Thu, Jan 5, 2017 at 8:15 PM, Peter Smith <peter.smith at linaro.org> wrote:

> Hello Rui,
>
> Thanks for the comments
>
> - Synthetic sections and rewriting relocations
> I think that this would definitely be worth trying. It should remove
> the need for thunks to be represented in the core data structures, and
> would allow .
>

Creating symbols for thunks would have another benefit: it makes
disassembled output easier to read because thunks have names.

> It would also mean that we wouldn't have to associate symbols with
> thunks as the relocations would directly target the thunks. ARM
> interworking makes reusing thunks more difficult as not every thunk is
> compatible with every caller. For example:
> ARM B target and Thumb2 B.W target can't reuse the same thunk even if
> in range as the branch instruction can't change state.
>
> I think it is worth an experiment to make the existing implementation
> of thunks use synthetic sections and rewriting relocations before
> trying to implement range extension thunks.
>
> - Yes the scan is linear it is essentially:
> do
>     assign addresses to input sections
>     for each relocation
>         if (thunk needed)
>             create thunk or reuse existing one
> while (no more thunks added)
>
> There's quite a lot of complexity that can be added with respect to
> the placement of thunks within the output section. For example if
> there is a caller with a low address and a caller with a high address,
> both might be able to reuse a thunk placed in the middle. I think it
> is worth starting simple though.

I agree. I believe that computing the best thunk positions is NP-hard, but
the best layout and a layout produced by a naive algorithm wouldn't be that
different.

> Peter
>
>
> On 5 January 2017 at 09:52, Rui Ueyama <ruiu at google.com> wrote:
> > Hi Peter,
> >
> > Here are my comments:
> >
> > - I didn't think hard enough, but I believe creating thunks as synthetic
> > sections instead of attached data for other input sections is towards a
> > right direction, because synthetic sections are suitable for adding
> > linker-generated data to output files.
> >
> > - As you wrote, we need to iterate relocations at least twice to create
> > range extension thunks. Each iteration can be a linear scan, correct? I
> > mean, we can start from the section at the lowest address towards higher
> > address examining relocations and create thunks if targets are too far.
> >
> > - I do not see a reason that we need to associate range extension thunks
> to
> > symbols. It seems to me that while scanning relocations, we need to keep
> > only the last thunk address for each symbol. If we find that some
> relocation
> > against symbol S needs a range extension thunk, we first check if the
> last
> > thunk for S is within the range and reuse it if it is. In this way, we
> need
> > to keep only one thunk for one symbol at any moment.
> >
> > - Have you considered rewriting relocations? I think, if we find that
> > relocation R pointing to symbol S needs a range extension thunk, we
> should
> > (1) create a range extension thunk, (2) create a symbol body object S'
> for
> > the thunk, (3) and rewrite R to point to S' instead of S. Then later
> passes
> > don't have to deal with thunks.
> >
> >
> > On Thu, Jan 5, 2017 at 3:34 AM, Peter Smith <peter.smith at linaro.org>
> wrote:
> >>
> >> I'm about to start working on range extension thunks in lld. This is
> >> an attempt to summarize the approach I'd like to take and what the
> >> impact will be on lld outside of thunks. I'm interested if anyone has
> >> any constraints the approach will break, alternative suggestions, or
> >> is working on something I'll need to take account of?
> >>
> >> I expect range extension thunks to be important for ARM and useful for
> >> AArch64. In principle any target with a limited range branch immediate
> >> instruction could benefit from them.
> >>
> >> I've put some detail about range extension thunks at the end of this
> >> message for those not familiar with them. The name range extension
> >> thunks is by no means universal, for example in ARM's ABI
> >> documentation they are called veneers, The GNU linker calls them
> >> stubs.
> >>
> >> Summary of existing thunk implementation (ARM interworking and Mips
> >> PIC to non-PIC calls):
> >> - A Regular, Shared or Undefined symbol may have a single thunk
> >> - For each relocation to a symbol S, if we need a thunk we use the
> >> thunk for S as the target of the relocation rather than S. The thunk
> >> will transfer control to S.
> >> - Thunks are assigned to an InputSection, these are written out when
> >> the InputSection is written. The InputSection with the Thunk contains
> >> either the caller (ARM) or callee (Mips).
> >> - For all existing thunks, the decision of whether a thunk is needed
> >> is not dependent on address. A Thumb branch to ARM will always need a
> >> thunk, no matter the distance. Thunks can therefore be generated
> >> relatively early in Writer::run().
> >>
> >> High level impact of range extension thunks:
> >>
> >> There may be more than one than one thunk per callee:
> >> - A range extension thunk must be placed within range of the caller,
> >> there may be cases where no single thunk for a callee is in range of
> >> all callers.
> >> - An ARM Target may need a different Thunk for ARM and Thumb callers.
> >>
> >> Address information is needed to determine if a range extension thunk is
> >> needed:
> >> - The more precise the address information available the less thunks
> >> will be generated. the most precise address information is the final
> >> address of caller and callee is known at thunk creation time, the
> >> least precise is neither the address of the caller or callee is known.
> >>
> >> Range extension thunks can be combined or replace other thunks
> >> - Thunks may also be used for instruction set interworking (ARM) or
> >> for calling between position independent and non-position independent
> >> code (Mips). Either a chain of thunks or a combined thunk that does
> >> both operations is needed. For ARM all range extension thunks can
> >> trivially be interworking thunks as well.
> >>
> >> Range extension thunk placement can be important
> >> - Many callers may need a range extension. Placing a range extension
> >> thunk so that it is in range of the most callers minimizes number of
> >> thunks needed.
> >> - Thunks may be better as synthetic sections rather than as additions
> >> to input sections.
> >>
> >> Adding/removing content must not break the range calculations used in
> >> range extension thunks.
> >> - If any caller, callee or thunk address is changed after range
> >> extension thunks are calculated it could invalidate the range
> >> calculation.
> >> - Ideally range extension thunks are the last operation the linker
> >> does prior to resolving relocations.
> >>
> >> I think that there are two separate areas to a range extension thunk
> >> implementation that can be considered separately.
> >> 1.) Moving thunk generation to a later stage, at a minimum we need an
> >> estimation of the address of each caller and callee, in an ideal world
> >> we know the final address of each caller and callee. This could mean
> >> assigning section addresses multiple times.
> >> 2.) The alterations to the core data structures to permit more than
> >> one Thunk per symbol and the logic to select the "right" Thunk for
> >> each relocation.
> >>
> >> The design I'd like to aim at moves thunk creation into
> >> finalizeSections() at a point where the sizes and addresses of all the
> >> SyntheticSections are known. This would mean that the final address of
> >> each caller and callee could be used, and after thunk creation there
> >> would be no further content changes. This would mean:
> >> - All code that runs prior to thunk creation may have the offset in
> >> the OutputSection altered by the addition of thunks. In particular
> >> scanRelocs() calculates the offset in the OutputSection of some
> >> relocations. We would need to find alternative ways of handling these
> >> cases so that they could either survive thunk creation or be patched
> >> up afterwards.
> >> - assignAddresses() will need to run at least twice if thunks are
> >> created. At least once to give the thunk creation the caller and
> >> callee addresses, and at least once after all thunks have been
> >> created.
> >>
> >> There is an alternative design that only uses estimates of caller and
> >> callee address to decide if a thunk is needed. In effect we use a
> >> heuristic to predict how much extra synthetic content, such as plt and
> >> got size, will be added after Thunk creation when deciding if a Thunk
> >> is needed. I'm not in favour of this approach as from bitter
> >> experience it tends to result in hard to debug problems when the
> >> heuristics break down. Precise addresses would also allow errata
> >> patching thunks [*]
> >>
> >> I've not thought too hard about how to alter the core data structures
> >> yet. I think this will mostly be implementation detail though.
> >>
> >> Next steps:
> >> I'd like to proceed with the following plan:
> >> 1.) Move the existing thunk implementation to where it would need to
> >> be in finalizeSections(). This should flush out all the non-thunk
> >> related assumptions about addresses without adding any existing
> >> complexity to the Thunk implementation.
> >> 2.) Add support for multiple thunks per symbol
> >> 3.) If it turns out to be a good idea, implement thunks as
> >> SyntheticSections
> >> 4.) Add support for range extensions.
> >>
> >> I think the first implementation of range extension thunks should be
> >> simple and not try too hard to minimize the number of thunks needed.
> >> If there is a need to optimize it can be done later as the changes
> >> should be within the thunk creation module.
> >>
> >> Thanks for reading
> >>
> >> Peter
> >>
> >> The remainder of the message is a brief explanation of range extension
> >> and errata patching thunks.
> >>
> >> What are range extension thunks?
> >> Many architectures have branch instructions that have a finite range
> >> that is insufficient to reach all possible program locations. For
> >> example the ARM branch immediate instruction has an immediate that
> >> encodes an offset of +-32Mb from the branch instruction. A range
> >> extension thunk is a linker generated code sequence, inserted between
> >> the caller and the callee, that completes the transfer of control to
> >> the callee when the distance between the caller and callee exceeds the
> >> range of the branch instruction. A simple example in psuedo assembly
> >> for a non-position independent ARM function call.
> >>
> >> source:
> >> BL long_range_thunk_to_target
> >> ...
> >> long_range_thunk_to_target
> >> LDR r12, target_address ; r12 is the corruptible interprocedural
> >> scratch register (ip)
> >> BX r12
> >> target_address:
> >> .word target ;
> >> ...
> >> target:
> >> ...
> >>
> >> What is an errata patching thunk?
> >> Some CPU errata (hardware bugs) can be fixed at a link time by
> >> replacing an instruction with a branch to a sequence of equivalent
> >> instructions that are guaranteed to to not trigger the erratum. In
> >> some cases the trigger sequence is dependent on precise addresses such
> >> as immediates crossing page boundaries, for example
> >> https://sourceware.org/ml/binutils-cvs/2015-04/msg00012.html . Errata
> >> patching is out of the scope of implementing range extension thunks
> >> but can be seen as a generalization of it.
> >
> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170106/d58da420/attachment-0001.html>