[llvm-dev] RFC: LLD range extension thunks

Fri Jan 6 15:12:30 PST 2017

On Fri, Jan 6, 2017 at 2:26 PM, Bruce Hoult <bruce at hoult.org> wrote:

> Sure, that's true. I was considering placing thunks from a lot of
> different origins to the same target. Your example is for different
> targets. Definitely that's a much harder problem.
>
> With 1M offsets this is not going to happen often enough to worry about
> and perfect optimality is probably not worth trying for.
>

Yes, optimality isn't worth trying for. Peak performance builds will
hopefully someday in LLD have functions reordered so that hot edges
typically won't need thunks. A lightweight heuristic for minimizing extra
TLB footprint due to thunks might be worthwhile though.

>
> It's a different matter if you only have +/-128B offsets! Then it will be
> extremely common. But at the same time there will be 8000 times fewer
> possible placements of each thunk, and the range of influence of each
> decision will be very small, making a dynamic programming approach very
> fast at finding the optimal solution.
>

I don't see how one would use dynamic programming for finding an optimal
solution to this problem because there doesn't seem to be optimal
substructure. In other words, for dynamic programming to be applicable, you
must be able to make a decision using local information and know that it is
optimal and that you will not needed to reconsider it. It's not clear to me
how a smaller branch displacement inherently allows deciding that a choice
is optimal based solely on local information.

-- Sean Silva

> A bigger problem is the basic block might be larger than the offset range,
> in which case you need to modify the BB to put the thunk inline (not
> actually a thunk any more).
>
> On Sat, Jan 7, 2017 at 12:12 AM, Sean Silva <chisophugis at gmail.com> wrote:
>
>>
>>
>> On Fri, Jan 6, 2017 at 12:41 AM, Bruce Hoult via llvm-dev <
>> llvm-dev at lists.llvm.org> wrote:
>>
>>>
>>>
>>> On Fri, Jan 6, 2017 at 6:21 AM, Rui Ueyama via llvm-dev <
>>> llvm-dev at lists.llvm.org> wrote:
>>>
>>>> On Thu, Jan 5, 2017 at 8:15 PM, Peter Smith <peter.smith at linaro.org>
>>>> wrote:
>>>>
>>>>> Hello Rui,
>>>>>
>>>>> Thanks for the comments
>>>>>
>>>>> - Synthetic sections and rewriting relocations
>>>>> I think that this would definitely be worth trying. It should remove
>>>>> the need for thunks to be represented in the core data structures, and
>>>>> would allow .
>>>>>
>>>>
>>>> Creating symbols for thunks would have another benefit: it makes
>>>> disassembled output easier to read because thunks have names.
>>>>
>>>>
>>>>> It would also mean that we wouldn't have to associate symbols with
>>>>> thunks as the relocations would directly target the thunks. ARM
>>>>> interworking makes reusing thunks more difficult as not every thunk is
>>>>> compatible with every caller. For example:
>>>>> ARM B target and Thumb2 B.W target can't reuse the same thunk even if
>>>>> in range as the branch instruction can't change state.
>>>>>
>>>>> I think it is worth an experiment to make the existing implementation
>>>>> of thunks use synthetic sections and rewriting relocations before
>>>>> trying to implement range extension thunks.
>>>>>
>>>>> - Yes the scan is linear it is essentially:
>>>>> do
>>>>>     assign addresses to input sections
>>>>>     for each relocation
>>>>>         if (thunk needed)
>>>>>             create thunk or reuse existing one
>>>>> while (no more thunks added)
>>>>>
>>>>> There's quite a lot of complexity that can be added with respect to
>>>>> the placement of thunks within the output section. For example if
>>>>> there is a caller with a low address and a caller with a high address,
>>>>> both might be able to reuse a thunk placed in the middle. I think it
>>>>> is worth starting simple though.
>>>>
>>>>
>>>> I agree. I believe that computing the best thunk positions is NP-hard,
>>>> but the best layout and a layout produced by a naive algorithm wouldn't be
>>>> that different.
>>>>
>>>
>>> Correct conclusion, but there's no way the problem is NP.
>>>
>>> Reordering functions (or even individual basic blocks) to minimize the
>>> needed thunks is a complex problem.
>>>
>>> But you're not doing that. Once an ordering is selected a simple greedy
>>> algorithm is optimal.
>>>
>>> There is no cost difference between a thunk that is right next to the
>>> short jump and a thunk that is only juuust within range. So you can find
>>> the lowest address jump needing a thunk to a particular target and put the
>>> thunk the maximum possible distance after it (after the end of a function,
>>> or even after any unconditional branch). Find everything else within range
>>> of that thunk and fix it up. Repeat.
>>>
>>
>> I don't think this analysis is correct. Assume a 1M branch displacement
>> for short jumps. Consider:
>>
>> secA: 512K (contains a jump "jumpA" at offset 0 that needs a thunk (it
>> doesn't matter where it needs to jump to, just that it definitely needs a
>> thunk))
>> secB: 512K
>> secC: 512K (contains a jump "jumpC" at offset 0 that jumps to offset 0 in
>> secA (i.e., just barely in range of a short jump))
>>
>> If the thunk for jumpA is placed between secB and secC (as it would be
>> based on your description) it will push the branch at the beginning of secC
>> out of range, forcing another thunk to be needed. In this small example,
>> the thunk for jumpA must be placed before secA in order to avoid needing a
>> thunk for jumpC. In other words, placing thunks can cause you to need even
>> more thunks.
>>
>> -- Sean Silva
>>
>>
>>
>>>
>>> Other algorithms will give smaller average displacements to the thunks,
>>> but there is no advantage in that. No other algorithm will generate fewer
>>> thunks.
>>>
>>> That's assuming all short branches have the same code size and
>>> displacement distance.
>>>
>>> If there are multiple branch distances and code sizes (and you have a
>>> choice between them at given call sites) then it's still just a simple
>>> dynamic programming problem, solvable in linear [1] time by trying each
>>> branch size at the first available call site, with a cache of the minimum
>>> cost assuming the first 0, 1, 2 .. N call sites have already been covered.
>>>
>>> [1] or at least nCallSites * nBranchSizes
>>>
>>>
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> llvm-dev at lists.llvm.org
>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>
>>>
>>
>> --
>> This message has been scanned for viruses and
>> dangerous content by *MailScanner* <http://www.mailscanner.info/>, and
>> is
>> believed to be clean.
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170106/9eea4ae0/attachment-0001.html>