[lld] Dealing with limited branch reach?

Sean Silva via llvm-commits llvm-commits at lists.llvm.org
Tue Oct 20 17:55:41 PDT 2015


On Tue, Oct 20, 2015 at 5:45 PM, Rui Ueyama <ruiu at google.com> wrote:

> On Tue, Oct 20, 2015 at 5:36 PM, Sean Silva <chisophugis at gmail.com> wrote:
>
>>
>>
>> On Tue, Oct 20, 2015 at 4:56 PM, Hal Finkel via llvm-commits <
>> llvm-commits at lists.llvm.org> wrote:
>>
>>> Hi Rui, Rafael, et al.,
>>>
>>> In order to move PPC64 support in lld to a point where it can self host,
>>> we need to deal with the following problem:
>>>
>>> On PPC, a relative branch can only have a signed 24-bit displacement
>>> (which is really a 26-bit signed displacement, once the two assumed
>>> lower-order bits are tacked on). Thus, the range is limited to +/- a few
>>> (tens of) megabytes, and if there is more code than that, we need to make
>>> other arrangements.
>>>
>>> As I understand it, other architectures (AArch64, for example), have
>>> similar limitations.
>>>
>>> Existing linkers handle this situation by inserting branch stubs, and
>>> placing the branch stubs close enough to the call sites.
>>>
>>> Here's a quick example:
>>>
>>> $ cat main.c
>>> void foo();
>>> int main() {
>>>   foo();
>>>   asm(".fill 50000000, 4, 0x60000000"); // lots of nops
>>>   return 0;
>>> }
>>>
>>> $ cat foo.c
>>> void foo() {}
>>>
>>> $ gcc -o btest main.c foo.c
>>>
>>> Now running objdump -d btest shows this relevant bit:
>>>
>>> 0000000010000500 <0000003a.plt_branch.foo+0>:
>>>     10000500:   3d 82 ff ff     addis   r12,r2,-1
>>>     10000504:   e9 6c 7f e8     ld      r11,32744(r12)
>>>     10000508:   7d 69 03 a6     mtctr   r11
>>>     1000050c:   4e 80 04 20     bctr
>>>
>>> 0000000010000510 <.main>:
>>>     10000510:   7c 08 02 a6     mflr    r0
>>>     10000514:   f8 01 00 10     std     r0,16(r1)
>>>     10000518:   fb e1 ff f8     std     r31,-8(r1)
>>>     1000051c:   f8 21 ff 81     stdu    r1,-128(r1)
>>>     10000520:   7c 3f 0b 78     mr      r31,r1
>>>     10000524:   4b ff ff dd     bl      10000500
>>> <0000003a.plt_branch.foo+0>
>>>     10000528:   60 00 00 00     nop
>>>     1000052c:   60 00 00 00     nop
>>>     10000530:   60 00 00 00     nop
>>>     10000534:   60 00 00 00     nop
>>> ...
>>>
>>> So it has taken the actual call target address and stuck it in a data
>>> section (referenced from the TOC base pointer), and the stub loads the
>>> address and jumps there.
>>>
>>> Currently, lld seems to write each input section that is part of an
>>> output section, in order, consecutively into that output section. Dealing
>>> properly with long-branch stubs, however, seems to require inserting
>>> intervening stub segments in between other .text sections.  This affects
>>> not only direct calls, but calls into .plt too (since they too need to be
>>> in range), or we need to split (and, perhaps, duplicate .plt entries) in
>>> order to make sure they're close enough as well.
>>>
>>> One possible way to do this is:
>>>
>>>  if (total size < some threshold) {
>>>    everything will fit, so do what we do now
>>>  } else {
>>>    group the input text segments so that each group (including the size
>>> of stubs) is below the threshold (we can scan each segment for branch
>>> relocations to determine if stubs are necessary)
>>>    insert the necessary stub segments after each grouping
>>>  }
>>>
>>> Various heuristics can make the groupings chosen more or less optimal,
>>> but perhaps that's another matter.
>>>
>>> Thoughts?
>>>
>>
>> Could we have an OutputSection subclass whose finalize() method does this
>> computation and edits its `std::vector<InputSection<ELFT> *> Sections` by
>> inserting "phony" input sections and rewriting relocations? That way, the
>> core layout algorithm is unaffected.
>>
>
> That algorithm is sub-optimal because inter-group calls are always through
> stubs, even if call destination is pretty close in memory, no?
>

I don't think we should expect "optimal" because this problem looks
NP-hard. So it is a matter of balancing optimality with ease of
implementation/maintenance. Using finalize() is a pretty simple solution.

-- Sean Silva


>
>
>> -- Sean Silva
>>
>>
>>>
>>> Thanks again,
>>> Hal
>>>
>>> --
>>> Hal Finkel
>>> Assistant Computational Scientist
>>> Leadership Computing Facility
>>> Argonne National Laboratory
>>> _______________________________________________
>>> llvm-commits mailing list
>>> llvm-commits at lists.llvm.org
>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits
>>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20151020/d756f2ce/attachment.html>


More information about the llvm-commits mailing list