[PATCH] D36749: [LLD][ELF][AArch64] Complete implementation of -fix-cortex-a53-843419

Wed Sep 20 08:01:45 PDT 2017

peter.smith added a comment.

Thank you very much for the comment, it is helpful to know what your constraints are. AArch64 and especially Arm are somewhat awkward targets as the ABI has been designed to move some of the toolchain complexity into the linker in order to simplify the compiler. Apologies in advance for the long response which I hope makes sense and is of some use. While I obviously would prefer to get these in earlier I will be at the US developers meeting next Month and will be happy to discuss.

I've written down what I think the requirements Thunks and Errata patching have with respect to relocations, I think that the majority of the code wouldn't need changing if relocations were heavily refactored. I do understand that they have their own complexity as well, a lot of it down to needing to run very late in the link due to needing address assignment.

An abstract thunk implementation:

- It must be possible to iterate over all the relocations in the program and identify the subset of branch relocations that need a Thunk.
- It must be possible to create a linker generated code-sequence (Thunk) and place it within range of the branch instruction.
- It must be possible to resolve the branch to the Thunk instead of the original Target (redirect the relocation).
- It must be possible for the code-sequence generated in the Thunk to transfer control to the original target of the relocation.
- It must be possible to generate a Thunk to a PLT entry.
- It is desirable to be able to reuse as many Thunks as possible.
- It is desirable guarantee to guarantee that all branches can reach their target (alternative is relocation out of range error message)

The existing Thunk implementation (and range thunks proposal)

- We iterate over all the thunks in the program and can read their Type to know that we may need to create a Thunk.
- We create a SyntheticSection for the Thunks, which allows us to create a Symbol that can be used as a target for relocations.
- We change the target of the relocation to the Symbol in the Thunk.
- We use the relocation code in the SyntheticSection to write the transfer of control to the original Target.

The alternative to changing the target of the Thunk is to modify RelExpr, such as what is done for PLT generation (when the linker sees R_PLT_PC, it knows that it needs to resolve the relocation not to the target symbol, but to the PLT entry for that symbol). Altering RelExpr does not require a new Symbol to be created as the relocation Target isn't changed. However it does mean that the relocation code and other areas need to distinguish between R_PC and R_PLT_PC. There are trade-offs for each approach, for example if Thunks were to change the RelExpr and indirect via some Thunk target then we would have to handle R_THUNK_PC and R_THUNK_PLT_PC. Redirecting the relocation Targets means that calls to Thunks can be resolved like any other branch relocation.

The existing implementation does not really need to be in Relocations.cpp, it could be moved out to Thunks.cpp if the one function that it needs fromPlt() is made global. In summary I think that as long as we can generate Symbols in SyntheticSections, change the target symbol of relocations,  then pretty much any redesign of relocations will be fine. It is helpful to reuse the existing relocation code to write the final transfer of control instructions, but there isn't anything stopping that code from being duplicated.

Abstract errata patching requirements (for the cortex-a53-843419 erratum):

- It must be possible to disassemble the code to look for the erratum sequence.
- It must be possible to generate a patch sequence that can be placed within branch range of the erratum sequence.
- It must be possible to replace an instruction in the erratum sequence with a branch to a patch that contains the instruction.
- It must be possible to transfer any relocation on the instruction we replace with a branch to the patch.
- It must be possible to return from the patch to the next instruction following the patch.
- It is desirable for both efficiency of the linker, and to minimize the number of patches, to have precise address information available so that only erratum sequences with page-offset of the ADRP instruction of 0xff8 or 0xffc are considered.

The current implementation uses relocations:

- We (ab)use the regular encoding of the AArch64 branch instruction to use a R_AARCH_JUMP26 relocation to replace an instruction with a branch. We may need to overwrite an existing relocation at the location or create a new one.
- We account for an existing relocation at the location we are patching by copying it to the patch.
- We reuse the existing relocation code to write the return address and resolve any relocation we have copied to the patch.

I've used relocations to do the dirty work of modifying the instructions mostly for convenience. I wanted to use the existing ELF building blocks that a compiler might use to construct the patches. It could be possible to add a patch list to InputSection and hard-code the changes without relocation but I think that this would just be duplicating functionality.

Both of the implementations try to follow the principle of adding SyntheticSections with symbols and retargeting relocations at those Symbols to keep them as an isolated pass. I think that they probably wouldn't get in the way of a major refactor.

https://reviews.llvm.org/D36749