[llvm-dev] RFC: LLD range extension thunks

Fri Jan 6 14:33:07 PST 2017

After looking at this for a while, I do not think that this problem is
NP-hard. With a finite "short branch" displacement k, I was not able to
come up with a gadget that could create global constraints as would be
needed to e.g. model an instance of 3SAT or vertex cover in terms of this
problem.

The problem is hard though. I believe that it is likely to be exponential
in the "short branch" displacement k, and k is typically "pretty big".

-- Sean Silva

On Fri, Jan 6, 2017 at 1:12 PM, Sean Silva <chisophugis at gmail.com> wrote:

>
>
> On Fri, Jan 6, 2017 at 12:41 AM, Bruce Hoult via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
>>
>>
>> On Fri, Jan 6, 2017 at 6:21 AM, Rui Ueyama via llvm-dev <
>> llvm-dev at lists.llvm.org> wrote:
>>
>>> On Thu, Jan 5, 2017 at 8:15 PM, Peter Smith <peter.smith at linaro.org>
>>> wrote:
>>>
>>>> Hello Rui,
>>>>
>>>> Thanks for the comments
>>>>
>>>> - Synthetic sections and rewriting relocations
>>>> I think that this would definitely be worth trying. It should remove
>>>> the need for thunks to be represented in the core data structures, and
>>>> would allow .
>>>>
>>>
>>> Creating symbols for thunks would have another benefit: it makes
>>> disassembled output easier to read because thunks have names.
>>>
>>>
>>>> It would also mean that we wouldn't have to associate symbols with
>>>> thunks as the relocations would directly target the thunks. ARM
>>>> interworking makes reusing thunks more difficult as not every thunk is
>>>> compatible with every caller. For example:
>>>> ARM B target and Thumb2 B.W target can't reuse the same thunk even if
>>>> in range as the branch instruction can't change state.
>>>>
>>>> I think it is worth an experiment to make the existing implementation
>>>> of thunks use synthetic sections and rewriting relocations before
>>>> trying to implement range extension thunks.
>>>>
>>>> - Yes the scan is linear it is essentially:
>>>> do
>>>>     assign addresses to input sections
>>>>     for each relocation
>>>>         if (thunk needed)
>>>>             create thunk or reuse existing one
>>>> while (no more thunks added)
>>>>
>>>> There's quite a lot of complexity that can be added with respect to
>>>> the placement of thunks within the output section. For example if
>>>> there is a caller with a low address and a caller with a high address,
>>>> both might be able to reuse a thunk placed in the middle. I think it
>>>> is worth starting simple though.
>>>
>>>
>>> I agree. I believe that computing the best thunk positions is NP-hard,
>>> but the best layout and a layout produced by a naive algorithm wouldn't be
>>> that different.
>>>
>>
>> Correct conclusion, but there's no way the problem is NP.
>>
>> Reordering functions (or even individual basic blocks) to minimize the
>> needed thunks is a complex problem.
>>
>> But you're not doing that. Once an ordering is selected a simple greedy
>> algorithm is optimal.
>>
>> There is no cost difference between a thunk that is right next to the
>> short jump and a thunk that is only juuust within range. So you can find
>> the lowest address jump needing a thunk to a particular target and put the
>> thunk the maximum possible distance after it (after the end of a function,
>> or even after any unconditional branch). Find everything else within range
>> of that thunk and fix it up. Repeat.
>>
>
> I don't think this analysis is correct. Assume a 1M branch displacement
> for short jumps. Consider:
>
> secA: 512K (contains a jump "jumpA" at offset 0 that needs a thunk (it
> doesn't matter where it needs to jump to, just that it definitely needs a
> thunk))
> secB: 512K
> secC: 512K (contains a jump "jumpC" at offset 0 that jumps to offset 0 in
> secA (i.e., just barely in range of a short jump))
>
> If the thunk for jumpA is placed between secB and secC (as it would be
> based on your description) it will push the branch at the beginning of secC
> out of range, forcing another thunk to be needed. In this small example,
> the thunk for jumpA must be placed before secA in order to avoid needing a
> thunk for jumpC. In other words, placing thunks can cause you to need even
> more thunks.
>
> -- Sean Silva
>
>
>
>>
>> Other algorithms will give smaller average displacements to the thunks,
>> but there is no advantage in that. No other algorithm will generate fewer
>> thunks.
>>
>> That's assuming all short branches have the same code size and
>> displacement distance.
>>
>> If there are multiple branch distances and code sizes (and you have a
>> choice between them at given call sites) then it's still just a simple
>> dynamic programming problem, solvable in linear [1] time by trying each
>> branch size at the first available call site, with a cache of the minimum
>> cost assuming the first 0, 1, 2 .. N call sites have already been covered.
>>
>> [1] or at least nCallSites * nBranchSizes
>>
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170106/507ffbfa/attachment.html>