[llvm-dev] [LLD] Linker Relaxation

Tue Jul 11 19:48:50 PDT 2017

On Tue, Jul 11, 2017 at 11:21 AM, Rui Ueyama via llvm-dev <
llvm-dev at lists.llvm.org> wrote:

> On Tue, Jul 11, 2017 at 9:14 PM, Bruce Hoult via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
>> Here's an example using the gcc toolchain for embedded 32 bit RISC-V (my
>> HiFive1 board):
>>
>> #include <stdio.h>
>>
>> int foo(int i){
>>     if (i < 100){
>>         printf("%d\n", i);
>>     }
>>     return i;
>> }
>>
>> int main(){
>>     foo(10);
>>     return 0;
>> }
>>
>> After compiling to a .o with -O2 -march=RV32IC we get (just looking at
>> foo)
>>
>> 00000000 <foo>:
>>    0: 1141                 addi sp,sp,-16
>>    2: c422                 sw s0,8(sp)
>>    4: c606                 sw ra,12(sp)
>>    6: 06300793           li a5,99
>>    a: 842a                 mv s0,a0
>>    c: 00a7cb63           blt a5,a0,22 <.L2>
>>   10: 85aa                 mv a1,a0
>>   12: 00000537           lui a0,0x0
>>   16: 00050513           mv a0,a0
>>   1a: 00000317           auipc t1,0x0
>>   1e: 000300e7           jalr t1
>>
>> 00000022 <.L2>:
>>   22: 40b2                 lw ra,12(sp)
>>   24: 8522                 mv a0,s0
>>   26: 4422                 lw s0,8(sp)
>>   28: 0141                 addi sp,sp,16
>>   2a: 8082                 ret
>>
>> And after linking:
>>
>> 00010164 <foo>:
>>    10164:       1141                    addi    sp,sp,-16
>>    10166:       c422                    sw      s0,8(sp)
>>    10168:       c606                    sw      ra,12(sp)
>>    1016a:       06300793                li      a5,99
>>    1016e:       842a                    mv      s0,a0
>>    10170:       00a7c863                blt     a5,a0,10180 <foo+0x1c>
>>    10174:       85aa                    mv      a1,a0
>>    10176:       0001a537                lui     a0,0x1a
>>    1017a:       6a050513                addi    a0,a0,1696 # 1a6a0
>> <__clz_tab+0x100>
>>    1017e:       2a69                    jal     10318 <printf>
>>    10180:       40b2                    lw      ra,12(sp)
>>    10182:       8522                    mv      a0,s0
>>    10184:       4422                    lw      s0,8(sp)
>>    10186:       0141                    addi    sp,sp,16
>>    10188:       8082                    ret
>>
>> The linker has done quite a lot!
>>
>> 1) the format string address generation has had the LUI (Load Upper
>> Immediate)
>> changed from 0x0 to 0x1a (the literal is in flash memory). If it had
>> stayed at
>> 0x0 it would have been removed by the linker. The mv a0,a0 (which is
>> really
>> addi a0,a0,#0) has had the real immediate filled in.
>>
>> 2) the call of printf had the general call-anywhere-in-the-address-space
>> auipc
>> (Add Upper Immediate to PC); jalr (Jump And Link to address in Register
>> (plus
>> offset)) sequence replaced by a simple jal (Jump And Link, with PC +/- 1
>> MB range)
>>
>> 3) as the jal offset was in fact less than +/- 2 KB, the 32 bit jal was
>> replaced by a
>> 16 bit jal instruction.
>>
>> 4) the conditional branch has been shortened from 18 bytes to 12 bytes
>> due to
>> the other changes.
>>
>
> Thanks, Bruce. This is a very interesting optimization.
>
> lld doesn't currently have code to support that kind of code shrinking
> optimization, but we can definitely add it. It seems that essentially we
> need to iterate over all relocations while rewriting instructions until a
> convergence is obtained. But the point is how to do do it efficiently --
> link speed really matters. I can't come up with an algorithm to parallelize
> it. Do you have any idea?
>

Essentially, you loop:
1. create a prospective address assignment / layout (basically a cumulative
sum operation on the section sizes)
2. visit all relocations in parallel, modifying the relocation / content
for eligible relocations. Practically, input sections would be visited in
parallel (instead of relocations) so that the changes to the input section
content are done serially with a single linear scan that avoids doing
O(input section size) work per modification of the section content.

To terminate the algorithm, you stop after step 1 and just calculate the
exact offsets for the layout.

Note that since offsets can only be decreased by step 2, modifications to
relocations/content always remain valid.

Parallelizing cumulative sum type calculations are well-studied in the
context of GPU's and hardware reduction trees. See e.g.
https://developer.nvidia.com/gpugems/GPUGems3/gpugems3_ch39.html

Note that we could in theory visit the relocations in parallel in step 2,
but then the changes to section's content would require a more complicated
algorithm, with each relaxation producing a size delta value that is then
accumulated with a cumulative sum type operation to coalesce the input
section and remove dead space. This type of pattern is common on GPU's;
e.g. AMD GPUs have a special instruction ds_ordered_count which is
optimized for precisely this operation; e.g. slide 42 of
https://frostbite-wp-prd.s3.amazonaws.com/wp-content/uploads/2016/03/29204330/GDC_2016_Compute.pdf

-- Sean Silva

>
> In order to shrink instructions, all address references must be explicitly
> represented as relocations even if they are in the same section. I think
> that means object files for RISC-V have many more relocations than the
> other architectures. Is this correct?
>
>
>>
>
>> On Tue, Jul 11, 2017 at 1:59 PM, Peter Smith via llvm-dev <
>> llvm-dev at lists.llvm.org> wrote:
>>
>>> Hello,
>>>
>>> To the best of my knowledge I think the closest analogue is something
>>> like the Synthetic EHFrame and MergeInputSections, not strictly code
>>> relaxation, but these do involve changes in size of sections.
>>>
>>> Can I ask you a quick question?  In many architectures not all
>>> pc-relative offsets are exposed to the linker as relocations so it
>>> isn't safe to change the sizes of sections in arbitrary places [*];
>>> does the RiscV ABI have restrictions on code-generation to allow a
>>> linker to reduce the code-size of a code-sequence within a Section? If
>>> there are constraints on the relaxations it might help us make a
>>> suggestion.
>>>
>>> I'm assuming that if you are doing some kind of range based relaxation
>>> you'll need something like range extension thunks (I'm working on
>>> these right now) this means you'll probably have to do your
>>> calculations on what you can relax in finalizeSections() at a similar
>>> point to createThunks(), or perhaps the mechanisms would need to be
>>> merged as I think they'll need to converge on a point when no more
>>> relaxations are possible and no more thunks can be added.
>>>
>>> Writing out the relaxed sections will be interesting as you won't want
>>> all of the InputSectionContents. I suggest looking at EHFrame and
>>> MergeInputSections for ideas.
>>>
>>> Hope that is of some use
>>>
>>> Peter
>>>
>>> [*] For example in pseudo ARM
>>>
>>>     ldr r0, [pc, offset] ; where pc + offset == label
>>>     ...
>>>     relaxable sequence such as an indirect jump via a register
>>>     ...
>>> label: .word foo
>>>
>>> If the compiler/assembler has pre-computed the offset to label then
>>> changing the size of the relaxable sequence without also updating the
>>> offset will break the program.
>>>
>>>
>>>
>>> On 11 July 2017 at 11:09, PkmX via llvm-dev <llvm-dev at lists.llvm.org>
>>> wrote:
>>> > Hi,
>>> >
>>> > Does lld support linker relaxation that may shrink code size? As far
>>> > as I see lld seems to assume that the content of input sections to be
>>> > fixed other than patching up relocations, but I believe some targets
>>> > may benefit the extra optimization opportunity with relaxation.
>>> > Specifically, I'm currently working on adding support for RISC-V in
>>> > lld, and RISC-V heavily relies on linker relaxation to remove
>>> > extraneous code and to handle alignment. Since linker relaxation may
>>> > be of interest to other targets as well, I'm wondering what would be a
>>> > good way to modify lld to support that. Thanks.
>>> >
>>> > --
>>> > Chih-Mao Chen (PkmX)
>>> > Software R&D, Andes Technology
>>> > _______________________________________________
>>> > LLVM Developers mailing list
>>> > llvm-dev at lists.llvm.org
>>> > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> llvm-dev at lists.llvm.org
>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>
>>
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
>>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170711/791d09d2/attachment-0001.html>