[PATCH] D39744: [LLD][ELF][AArch64] Add support for AArch64 range extension thunks.

Mon Nov 27 04:23:27 PST 2017

peter.smith added a comment.

Adding in comment from [llvm-commits]

>> - I've used Thunks that can access the whole address range. There are more efficient Thunks that can be used, for example the PI thunk could use the usual ADRP addressing mode, but this would limit the range to +/- 4Gb. In an ideal world we could generate the thunks with a limited >range only when we know they are in range, but this will require some changes and additional complexity in the underlying framework so I chose to keep it simple.
> 
> Do you know what other linkers do? Do they start with a adrp and upgrade
>  the thunk if that is not sufficient?

In gold and bfd, an adrp is used if the if the destination is within 4Gb, if it drifts out of range a longer range "stub" very similar to the ones I'm proposing here are used.

> On x86_64 the compiler is required to not use the small code model to support more than 4gb. Is there something like it in aarch64?

Yes aarch64 has a small and large code model, the small code-model only supports applications up to 4Gb. AFAIK the large code model is implemented in clang and gcc for non position independent code, but is not yet implemented for position independent code. So at this stage you could not realistically build a position independent example > 4Gb. It is possible to build an executable that does though.

> Index: test/ELF/aarch64-thunk-section-location.s
>  ===================================================================
> 
> - /dev/null +++ test/ELF/aarch64-thunk-section-location.s @@ -0,0 +1,41 @@ +// RUN: llvm-mc -filetype=obj -triple=aarch64-linux-gnu %s -o %t +// RUN: ld.lld %t -o %t2 2>&1 +// RUN: llvm-objdump -d  -start-address=134086664 -stop-address=134086676 -triple=aarch64-linux-gnu %t2 | FileCheck %s + +// Check that the range extension thunks are dumped close to the aarch64 branch +// range of 128 MiB + .section .text.1, "ax", %progbits + .balign 0x1000 + .globl _start +_start: + bl high_target + ret + + .section .text.2, "ax", %progbits + .space 0x2000000 + + .section .text.2, "ax", %progbits + .space 0x2000000 + + .section .text.3, "ax", %progbits + .space 0x2000000 + + .section .text.4, "ax", %progbits + .space 0x2000000 - 0x40000 + + .section .text.5, "ax", %progbits + .space 0x40000
> 
> Why do you need multiple sections instead of a single .space with the
>  total amount? To show that the thunk could have been placed earlier but
>  was not?

Yes, exactly that, it isn't a particularly important test case, it is just a check to make sure we try and place the thunk to be in range of the most number of callers when it can.

> It is surprising that we expected support more than 4GB of code, but if
>  that is an aarch64 requirement, LGTM.

Given that it isn't possible to build a position independent executable > 4Gb I think it would be safe to do:

- Use the load-literal for all non-position independent thunks
- Use an ADRP for position independent thunks

Does that sound preferable?

Ideally I think I'd prefer to use ADRP for non-position independent thunks as it can go into execute only memory. However I think supporting multiple ranges of thunks is a separate patch.

Peter

https://reviews.llvm.org/D39744