[llvm-dev] [LLD] Linker Relaxation
Sean Silva via llvm-dev
llvm-dev at lists.llvm.org
Tue Jul 11 19:48:50 PDT 2017
On Tue, Jul 11, 2017 at 11:21 AM, Rui Ueyama via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> On Tue, Jul 11, 2017 at 9:14 PM, Bruce Hoult via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
>> Here's an example using the gcc toolchain for embedded 32 bit RISC-V (my
>> HiFive1 board):
>>
>> #include <stdio.h>
>>
>> int foo(int i){
>> if (i < 100){
>> printf("%d\n", i);
>> }
>> return i;
>> }
>>
>> int main(){
>> foo(10);
>> return 0;
>> }
>>
>> After compiling to a .o with -O2 -march=RV32IC we get (just looking at
>> foo)
>>
>> 00000000 <foo>:
>> 0: 1141 addi sp,sp,-16
>> 2: c422 sw s0,8(sp)
>> 4: c606 sw ra,12(sp)
>> 6: 06300793 li a5,99
>> a: 842a mv s0,a0
>> c: 00a7cb63 blt a5,a0,22 <.L2>
>> 10: 85aa mv a1,a0
>> 12: 00000537 lui a0,0x0
>> 16: 00050513 mv a0,a0
>> 1a: 00000317 auipc t1,0x0
>> 1e: 000300e7 jalr t1
>>
>> 00000022 <.L2>:
>> 22: 40b2 lw ra,12(sp)
>> 24: 8522 mv a0,s0
>> 26: 4422 lw s0,8(sp)
>> 28: 0141 addi sp,sp,16
>> 2a: 8082 ret
>>
>> And after linking:
>>
>> 00010164 <foo>:
>> 10164: 1141 addi sp,sp,-16
>> 10166: c422 sw s0,8(sp)
>> 10168: c606 sw ra,12(sp)
>> 1016a: 06300793 li a5,99
>> 1016e: 842a mv s0,a0
>> 10170: 00a7c863 blt a5,a0,10180 <foo+0x1c>
>> 10174: 85aa mv a1,a0
>> 10176: 0001a537 lui a0,0x1a
>> 1017a: 6a050513 addi a0,a0,1696 # 1a6a0
>> <__clz_tab+0x100>
>> 1017e: 2a69 jal 10318 <printf>
>> 10180: 40b2 lw ra,12(sp)
>> 10182: 8522 mv a0,s0
>> 10184: 4422 lw s0,8(sp)
>> 10186: 0141 addi sp,sp,16
>> 10188: 8082 ret
>>
>> The linker has done quite a lot!
>>
>> 1) the format string address generation has had the LUI (Load Upper
>> Immediate)
>> changed from 0x0 to 0x1a (the literal is in flash memory). If it had
>> stayed at
>> 0x0 it would have been removed by the linker. The mv a0,a0 (which is
>> really
>> addi a0,a0,#0) has had the real immediate filled in.
>>
>> 2) the call of printf had the general call-anywhere-in-the-address-space
>> auipc
>> (Add Upper Immediate to PC); jalr (Jump And Link to address in Register
>> (plus
>> offset)) sequence replaced by a simple jal (Jump And Link, with PC +/- 1
>> MB range)
>>
>> 3) as the jal offset was in fact less than +/- 2 KB, the 32 bit jal was
>> replaced by a
>> 16 bit jal instruction.
>>
>> 4) the conditional branch has been shortened from 18 bytes to 12 bytes
>> due to
>> the other changes.
>>
>
> Thanks, Bruce. This is a very interesting optimization.
>
> lld doesn't currently have code to support that kind of code shrinking
> optimization, but we can definitely add it. It seems that essentially we
> need to iterate over all relocations while rewriting instructions until a
> convergence is obtained. But the point is how to do do it efficiently --
> link speed really matters. I can't come up with an algorithm to parallelize
> it. Do you have any idea?
>
Essentially, you loop:
1. create a prospective address assignment / layout (basically a cumulative
sum operation on the section sizes)
2. visit all relocations in parallel, modifying the relocation / content
for eligible relocations. Practically, input sections would be visited in
parallel (instead of relocations) so that the changes to the input section
content are done serially with a single linear scan that avoids doing
O(input section size) work per modification of the section content.
To terminate the algorithm, you stop after step 1 and just calculate the
exact offsets for the layout.
Note that since offsets can only be decreased by step 2, modifications to
relocations/content always remain valid.
Parallelizing cumulative sum type calculations are well-studied in the
context of GPU's and hardware reduction trees. See e.g.
https://developer.nvidia.com/gpugems/GPUGems3/gpugems3_ch39.html
Note that we could in theory visit the relocations in parallel in step 2,
but then the changes to section's content would require a more complicated
algorithm, with each relaxation producing a size delta value that is then
accumulated with a cumulative sum type operation to coalesce the input
section and remove dead space. This type of pattern is common on GPU's;
e.g. AMD GPUs have a special instruction ds_ordered_count which is
optimized for precisely this operation; e.g. slide 42 of
https://frostbite-wp-prd.s3.amazonaws.com/wp-content/uploads/2016/03/29204330/GDC_2016_Compute.pdf
-- Sean Silva
>
> In order to shrink instructions, all address references must be explicitly
> represented as relocations even if they are in the same section. I think
> that means object files for RISC-V have many more relocations than the
> other architectures. Is this correct?
>
>
>>
>
>> On Tue, Jul 11, 2017 at 1:59 PM, Peter Smith via llvm-dev <
>> llvm-dev at lists.llvm.org> wrote:
>>
>>> Hello,
>>>
>>> To the best of my knowledge I think the closest analogue is something
>>> like the Synthetic EHFrame and MergeInputSections, not strictly code
>>> relaxation, but these do involve changes in size of sections.
>>>
>>> Can I ask you a quick question? In many architectures not all
>>> pc-relative offsets are exposed to the linker as relocations so it
>>> isn't safe to change the sizes of sections in arbitrary places [*];
>>> does the RiscV ABI have restrictions on code-generation to allow a
>>> linker to reduce the code-size of a code-sequence within a Section? If
>>> there are constraints on the relaxations it might help us make a
>>> suggestion.
>>>
>>> I'm assuming that if you are doing some kind of range based relaxation
>>> you'll need something like range extension thunks (I'm working on
>>> these right now) this means you'll probably have to do your
>>> calculations on what you can relax in finalizeSections() at a similar
>>> point to createThunks(), or perhaps the mechanisms would need to be
>>> merged as I think they'll need to converge on a point when no more
>>> relaxations are possible and no more thunks can be added.
>>>
>>> Writing out the relaxed sections will be interesting as you won't want
>>> all of the InputSectionContents. I suggest looking at EHFrame and
>>> MergeInputSections for ideas.
>>>
>>> Hope that is of some use
>>>
>>> Peter
>>>
>>> [*] For example in pseudo ARM
>>>
>>> ldr r0, [pc, offset] ; where pc + offset == label
>>> ...
>>> relaxable sequence such as an indirect jump via a register
>>> ...
>>> label: .word foo
>>>
>>> If the compiler/assembler has pre-computed the offset to label then
>>> changing the size of the relaxable sequence without also updating the
>>> offset will break the program.
>>>
>>>
>>>
>>> On 11 July 2017 at 11:09, PkmX via llvm-dev <llvm-dev at lists.llvm.org>
>>> wrote:
>>> > Hi,
>>> >
>>> > Does lld support linker relaxation that may shrink code size? As far
>>> > as I see lld seems to assume that the content of input sections to be
>>> > fixed other than patching up relocations, but I believe some targets
>>> > may benefit the extra optimization opportunity with relaxation.
>>> > Specifically, I'm currently working on adding support for RISC-V in
>>> > lld, and RISC-V heavily relies on linker relaxation to remove
>>> > extraneous code and to handle alignment. Since linker relaxation may
>>> > be of interest to other targets as well, I'm wondering what would be a
>>> > good way to modify lld to support that. Thanks.
>>> >
>>> > --
>>> > Chih-Mao Chen (PkmX)
>>> > Software R&D, Andes Technology
>>> > _______________________________________________
>>> > LLVM Developers mailing list
>>> > llvm-dev at lists.llvm.org
>>> > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> llvm-dev at lists.llvm.org
>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>
>>
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
>>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170711/791d09d2/attachment-0001.html>
More information about the llvm-dev
mailing list