[lld] Dealing with limited branch reach?

Tue Oct 20 17:36:43 PDT 2015

On Tue, Oct 20, 2015 at 4:56 PM, Hal Finkel via llvm-commits <
llvm-commits at lists.llvm.org> wrote:

> Hi Rui, Rafael, et al.,
>
> In order to move PPC64 support in lld to a point where it can self host,
> we need to deal with the following problem:
>
> On PPC, a relative branch can only have a signed 24-bit displacement
> (which is really a 26-bit signed displacement, once the two assumed
> lower-order bits are tacked on). Thus, the range is limited to +/- a few
> (tens of) megabytes, and if there is more code than that, we need to make
> other arrangements.
>
> As I understand it, other architectures (AArch64, for example), have
> similar limitations.
>
> Existing linkers handle this situation by inserting branch stubs, and
> placing the branch stubs close enough to the call sites.
>
> Here's a quick example:
>
> $ cat main.c
> void foo();
> int main() {
>   foo();
>   asm(".fill 50000000, 4, 0x60000000"); // lots of nops
>   return 0;
> }
>
> $ cat foo.c
> void foo() {}
>
> $ gcc -o btest main.c foo.c
>
> Now running objdump -d btest shows this relevant bit:
>
> 0000000010000500 <0000003a.plt_branch.foo+0>:
>     10000500:   3d 82 ff ff     addis   r12,r2,-1
>     10000504:   e9 6c 7f e8     ld      r11,32744(r12)
>     10000508:   7d 69 03 a6     mtctr   r11
>     1000050c:   4e 80 04 20     bctr
>
> 0000000010000510 <.main>:
>     10000510:   7c 08 02 a6     mflr    r0
>     10000514:   f8 01 00 10     std     r0,16(r1)
>     10000518:   fb e1 ff f8     std     r31,-8(r1)
>     1000051c:   f8 21 ff 81     stdu    r1,-128(r1)
>     10000520:   7c 3f 0b 78     mr      r31,r1
>     10000524:   4b ff ff dd     bl      10000500
> <0000003a.plt_branch.foo+0>
>     10000528:   60 00 00 00     nop
>     1000052c:   60 00 00 00     nop
>     10000530:   60 00 00 00     nop
>     10000534:   60 00 00 00     nop
> ...
>
> So it has taken the actual call target address and stuck it in a data
> section (referenced from the TOC base pointer), and the stub loads the
> address and jumps there.
>
> Currently, lld seems to write each input section that is part of an output
> section, in order, consecutively into that output section. Dealing properly
> with long-branch stubs, however, seems to require inserting intervening
> stub segments in between other .text sections.  This affects not only
> direct calls, but calls into .plt too (since they too need to be in range),
> or we need to split (and, perhaps, duplicate .plt entries) in order to make
> sure they're close enough as well.
>
> One possible way to do this is:
>
>  if (total size < some threshold) {
>    everything will fit, so do what we do now
>  } else {
>    group the input text segments so that each group (including the size of
> stubs) is below the threshold (we can scan each segment for branch
> relocations to determine if stubs are necessary)
>    insert the necessary stub segments after each grouping
>  }
>
> Various heuristics can make the groupings chosen more or less optimal, but
> perhaps that's another matter.
>
> Thoughts?
>

Could we have an OutputSection subclass whose finalize() method does this
computation and edits its `std::vector<InputSection<ELFT> *> Sections` by
inserting "phony" input sections and rewriting relocations? That way, the
core layout algorithm is unaffected.

-- Sean Silva

>
> Thanks again,
> Hal
>
> --
> Hal Finkel
> Assistant Computational Scientist
> Leadership Computing Facility
> Argonne National Laboratory
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20151020/18cd4d08/attachment.html>