[llvm-dev] [RFC][LLD][ARM] Initial ARM port for LLD

Peter Smith via llvm-dev llvm-dev at lists.llvm.org
Fri Jun 3 23:59:00 PDT 2016

On 3 June 2016 at 20:23, Rui Ueyama <ruiu at google.com> wrote:
> On Fri, Jun 3, 2016 at 2:28 AM, Peter Smith via llvm-dev
> <llvm-dev at lists.llvm.org> wrote:
>> Hello everyone,
>> The review http://reviews.llvm.org/D20951 implements initial support
>> for the ARM architecture in LLD. To keep the patch size down, and to
>> avoid the complexities of interworking between ARM and Thumb,  there
>> is just enough support for an ARM only Hello World to link and run on
>> ARM Linux [*].
>> My main aim is to get this functionality committed as the basis of an
>> ARM port and would like to know how best to go about this? I wanted to
>> start the ARM port with enough functionality to execute at least one
>> simple program, although this still might be too large to face in one
>> review. I'm happy to split it up into a series of smaller patches with
>> one relocation, one test file. Alternatively if you think more
>> functionality is needed, please let me know what I need to complete
>> for a first commit?
>> The initial goal for Linaro is ARM Linux support for ARMv7 and AArch32
>> ARMv8. With interworking, TLS and exceptions support the ARM support
>> and test coverage should be on a par with the current state of the lld
>> AArch64 support.
>> There are several major pieces of work to do for ARM linux:
>> My rough order of priority on what to work on next:
>> Support for Thumb, Thumb2 and interworking:
>> - Thumb relocation directives.
>> - BL to BLX transformation for function calls between ARM and Thumb.
>> - Thunk generation for B immediate.
>> - Keeping track of which parts are ARM/Thumb.
> Simon implemented "thunk" for MIPS. Thunk contains linker-generated machine
> code and is added after each input section in the result and bridges
> incompatible function calls. I think you can use it for ARM thumb and
> non-thumb function calls.

Yes that is a good candidate to start looking.

>> Support for TLS
>> - Add relocation and relaxation support for standard and descriptor
>> based models.
> I think we already have a good foundation, so I guess you only need add
> architecture-specific code there in Relocations.cpp.

Yes, I think this is just a matter of doing the work.

>> Support for C++ exceptions:
>> - Creation of a PT_ARM_EXIDX program header.
>> - Support for the SHF_LINK_ORDER (used by .ARM.exidx).
> Is there anything special in them from the linker's perspective?

PT_ARM_EXIDX is analogous to PT_GNU_EH_FRAME, it just describes the
address range of the output section.

The SHF_LINK_ORDER is a little more disruptive, although it is in the
generic ELF spec and is not ARM specific. In summary it is  used to
order the .ARM.exidx sections in the same order as the .text sections
they describe so that the unwinder can binary search the .ARM.exidx
sections. The wording in ELF is not great, but the intent is clear.

>> Support for range-extension
>> - Thumb2 BL range is only 16 Mb, conditional branch range is only 1
>> Mb. Range extension thunks are likely to be needed for large programs.
> This can be added to thunks, I guess.

In theory yes although it can get complicated in the most general
case. For example adding thunks can increase distance between
sections, which can generate the need for more thunks. A simple
implementation can handle the majority of cases though.

>> Support for big-endian ARM targets
>> - ARMv6 and above has little-endian instructions and big-endian data.
>> The input objects for ARM have big-endian instructions so the linker
>> must endian reverse each instruction.
> Does this mean you need to read every 4 bytes in .text (or 2 bytes if thumb)
> and swap the byte order of the word, if the target is big-endian? If the
> linker is supposed to do it, we have no choice other than doing it, but
> it's... weird.

Yes and it is definitely weird, it is a legacy of ARM supporting two
big-endian modes for legacy purposes, one with big-endian
instructions, big-endian data; one with little endian instructions,
big-endian data. By making the linker do the endian-swapping the
compiler and assembler only need to produce big-endian instructions,
big-endian data.

In AArch64, there is only little-endian instructions big-endian data
so compilers and assemblers can produce the format directly.

In practice little-endian ARM systems are much more common than
big-endian systems so this is something that can be done later on.

>> Use SHT_ARM_ATTRIBUTES sections for compatibility checking
>> - Detect incompatible objects at link-time rather than risk runtime
>> errors.
> You may want to take a look at isCompatible function in SymbolTable.cpp
> which does similar thing.

Ok, I'll take a look, thanks for pointing it out.

Thank you for the comments


>> At this stage I haven't thought too hard about how best to implement
>> these. I think that some of these may be disruptive enough to post
>> design alternatives as RFCs rather than as reviews.
>> [*] I tested hello world against an old GCC distribution that has ARM
>> only libraries that do not require interworking:
>> arm-none-linux-gnueabi-gcc (CodeSourcery Sourcery G++ Lite 2007q1-10)
>> 4.2.0.
>> Documentation can be found in the ABI for the ARM Architecture, which
>> is available on ARM's website:
>> http://infocenter.arm.com/help/topic/com.arm.doc.subset.swdev.abi/index.html
>> The official instruction encodings are documented in the ARM
>> Architecture Reference Manual. This is publically available from ARM
>> but requires a free registration to download:
>> http://infocenter.arm.com/help/topic/com.arm.doc.ddi0406c/index.html
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

More information about the llvm-dev mailing list