[llvm-dev] RFC: LLD range extension thunks

Wed Jan 18 07:54:46 PST 2017

Sorry for being late on the thread, but I just wanted to say that I
agree with the design. The problem is very similar to relaxation in MC
and should probably have a similar solution:

* Keep all offsets relative to input/synthetic sections (fragments in
  MC).
* Compute addresses.
* If anything is not in range add a thunk (relax in MC).
* Repeat.

Cheers,
Rafael

Peter Smith <peter.smith at linaro.org> writes:

> I'm about to start working on range extension thunks in lld. This is
> an attempt to summarize the approach I'd like to take and what the
> impact will be on lld outside of thunks. I'm interested if anyone has
> any constraints the approach will break, alternative suggestions, or
> is working on something I'll need to take account of?
>
> I expect range extension thunks to be important for ARM and useful for
> AArch64. In principle any target with a limited range branch immediate
> instruction could benefit from them.
>
> I've put some detail about range extension thunks at the end of this
> message for those not familiar with them. The name range extension
> thunks is by no means universal, for example in ARM's ABI
> documentation they are called veneers, The GNU linker calls them
> stubs.
>
> Summary of existing thunk implementation (ARM interworking and Mips
> PIC to non-PIC calls):
> - A Regular, Shared or Undefined symbol may have a single thunk
> - For each relocation to a symbol S, if we need a thunk we use the
> thunk for S as the target of the relocation rather than S. The thunk
> will transfer control to S.
> - Thunks are assigned to an InputSection, these are written out when
> the InputSection is written. The InputSection with the Thunk contains
> either the caller (ARM) or callee (Mips).
> - For all existing thunks, the decision of whether a thunk is needed
> is not dependent on address. A Thumb branch to ARM will always need a
> thunk, no matter the distance. Thunks can therefore be generated
> relatively early in Writer::run().
>
> High level impact of range extension thunks:
>
> There may be more than one than one thunk per callee:
> - A range extension thunk must be placed within range of the caller,
> there may be cases where no single thunk for a callee is in range of
> all callers.
> - An ARM Target may need a different Thunk for ARM and Thumb callers.
>
> Address information is needed to determine if a range extension thunk is needed:
> - The more precise the address information available the less thunks
> will be generated. the most precise address information is the final
> address of caller and callee is known at thunk creation time, the
> least precise is neither the address of the caller or callee is known.
>
> Range extension thunks can be combined or replace other thunks
> - Thunks may also be used for instruction set interworking (ARM) or
> for calling between position independent and non-position independent
> code (Mips). Either a chain of thunks or a combined thunk that does
> both operations is needed. For ARM all range extension thunks can
> trivially be interworking thunks as well.
>
> Range extension thunk placement can be important
> - Many callers may need a range extension. Placing a range extension
> thunk so that it is in range of the most callers minimizes number of
> thunks needed.
> - Thunks may be better as synthetic sections rather than as additions
> to input sections.
>
> Adding/removing content must not break the range calculations used in
> range extension thunks.
> - If any caller, callee or thunk address is changed after range
> extension thunks are calculated it could invalidate the range
> calculation.
> - Ideally range extension thunks are the last operation the linker
> does prior to resolving relocations.
>
> I think that there are two separate areas to a range extension thunk
> implementation that can be considered separately.
> 1.) Moving thunk generation to a later stage, at a minimum we need an
> estimation of the address of each caller and callee, in an ideal world
> we know the final address of each caller and callee. This could mean
> assigning section addresses multiple times.
> 2.) The alterations to the core data structures to permit more than
> one Thunk per symbol and the logic to select the "right" Thunk for
> each relocation.
>
> The design I'd like to aim at moves thunk creation into
> finalizeSections() at a point where the sizes and addresses of all the
> SyntheticSections are known. This would mean that the final address of
> each caller and callee could be used, and after thunk creation there
> would be no further content changes. This would mean:
> - All code that runs prior to thunk creation may have the offset in
> the OutputSection altered by the addition of thunks. In particular
> scanRelocs() calculates the offset in the OutputSection of some
> relocations. We would need to find alternative ways of handling these
> cases so that they could either survive thunk creation or be patched
> up afterwards.
> - assignAddresses() will need to run at least twice if thunks are
> created. At least once to give the thunk creation the caller and
> callee addresses, and at least once after all thunks have been
> created.
>
> There is an alternative design that only uses estimates of caller and
> callee address to decide if a thunk is needed. In effect we use a
> heuristic to predict how much extra synthetic content, such as plt and
> got size, will be added after Thunk creation when deciding if a Thunk
> is needed. I'm not in favour of this approach as from bitter
> experience it tends to result in hard to debug problems when the
> heuristics break down. Precise addresses would also allow errata
> patching thunks [*]
>
> I've not thought too hard about how to alter the core data structures
> yet. I think this will mostly be implementation detail though.
>
> Next steps:
> I'd like to proceed with the following plan:
> 1.) Move the existing thunk implementation to where it would need to
> be in finalizeSections(). This should flush out all the non-thunk
> related assumptions about addresses without adding any existing
> complexity to the Thunk implementation.
> 2.) Add support for multiple thunks per symbol
> 3.) If it turns out to be a good idea, implement thunks as SyntheticSections
> 4.) Add support for range extensions.
>
> I think the first implementation of range extension thunks should be
> simple and not try too hard to minimize the number of thunks needed.
> If there is a need to optimize it can be done later as the changes
> should be within the thunk creation module.
>
> Thanks for reading
>
> Peter
>
> The remainder of the message is a brief explanation of range extension
> and errata patching thunks.
>
> What are range extension thunks?
> Many architectures have branch instructions that have a finite range
> that is insufficient to reach all possible program locations. For
> example the ARM branch immediate instruction has an immediate that
> encodes an offset of +-32Mb from the branch instruction. A range
> extension thunk is a linker generated code sequence, inserted between
> the caller and the callee, that completes the transfer of control to
> the callee when the distance between the caller and callee exceeds the
> range of the branch instruction. A simple example in psuedo assembly
> for a non-position independent ARM function call.
>
> source:
> BL long_range_thunk_to_target
> ...
> long_range_thunk_to_target
> LDR r12, target_address ; r12 is the corruptible interprocedural
> scratch register (ip)
> BX r12
> target_address:
> .word target ;
> ...
> target:
> ...
>
> What is an errata patching thunk?
> Some CPU errata (hardware bugs) can be fixed at a link time by
> replacing an instruction with a branch to a sequence of equivalent
> instructions that are guaranteed to to not trigger the erratum. In
> some cases the trigger sequence is dependent on precise addresses such
> as immediates crossing page boundaries, for example
> https://sourceware.org/ml/binutils-cvs/2015-04/msg00012.html . Errata
> patching is out of the scope of implementing range extension thunks
> but can be seen as a generalization of it.