<div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote">On Fri, Jan 6, 2017 at 6:21 AM, Rui Ueyama via llvm-dev <span dir="ltr"><<a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><span class="gmail-">On Thu, Jan 5, 2017 at 8:15 PM, Peter Smith <span dir="ltr"><<a href="mailto:peter.smith@linaro.org" target="_blank">peter.smith@linaro.org</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">Hello Rui,<br>

<br>

Thanks for the comments<br>

<br>

- Synthetic sections and rewriting relocations<br>

I think that this would definitely be worth trying. It should remove<br>

the need for thunks to be represented in the core data structures, and<br>

would allow .<br></blockquote><div><br></div></span><div>Creating symbols for thunks would have another benefit: it makes disassembled output easier to read because thunks have names.</div><span class="gmail-"><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">

It would also mean that we wouldn't have to associate symbols with<br>

thunks as the relocations would directly target the thunks. ARM<br>

interworking makes reusing thunks more difficult as not every thunk is<br>

compatible with every caller. For example:<br>

ARM B target and Thumb2 B.W target can't reuse the same thunk even if<br>

in range as the branch instruction can't change state.<br>

<br>

I think it is worth an experiment to make the existing implementation<br>

of thunks use synthetic sections and rewriting relocations before<br>

trying to implement range extension thunks.<br>

<br>

- Yes the scan is linear it is essentially:<br>

do<br>

    assign addresses to input sections<br>

    for each relocation<br>

        if (thunk needed)<br>

            create thunk or reuse existing one<br>

while (no more thunks added)<br>

<br>

There's quite a lot of complexity that can be added with respect to<br>

the placement of thunks within the output section. For example if<br>

there is a caller with a low address and a caller with a high address,<br>

both might be able to reuse a thunk placed in the middle. I think it<br>

is worth starting simple though.</blockquote><div><br></div></span><div>I agree. I believe that computing the best thunk positions is NP-hard, but the best layout and a layout produced by a naive algorithm wouldn't be that different.</div></div></div></div></blockquote><div><br></div><div style="font-size:12.8px">Correct conclusion, but there's no way the problem is NP.</div><div style="font-size:12.8px"><br></div><div style="font-size:12.8px">Reordering functions (or even individual basic blocks) to minimize the needed thunks is a complex problem.</div><div style="font-size:12.8px"><br></div><div style="font-size:12.8px">But you're not doing that. Once an ordering is selected a simple greedy algorithm is optimal.</div><div style="font-size:12.8px"><br></div><div style="font-size:12.8px">There is no cost difference between a thunk that is right next to the short jump and a thunk that is only juuust within range. So you can find the lowest address jump needing a thunk to a particular target and put the thunk the maximum possible distance after it (after the end of a function, or even after any unconditional branch). Find everything else within range of that thunk and fix it up. Repeat.</div><div style="font-size:12.8px"><br></div><div style="font-size:12.8px">Other algorithms will give smaller average displacements to the thunks, but there is no advantage in that. No other algorithm will generate fewer thunks.</div><div style="font-size:12.8px"><br></div><div style="font-size:12.8px">That's assuming all short branches have the same code size and displacement distance.</div><div style="font-size:12.8px"><br></div><div style="font-size:12.8px">If there are multiple branch distances and code sizes (and you have a choice between them at given call sites) then it's still just a simple dynamic programming problem, solvable in linear [1] time by trying each branch size at the first available call site, with a cache of the minimum cost assuming the first 0, 1, 2 .. N call sites have already been covered.</div><div style="font-size:12.8px"><br></div><div><span style="font-size:12.8px">[1] or at least nCallSites * nBranchSizes</span></div><div> </div></div></div></div>