[llvm-dev] Proposal: arbitrary relocations in constant global initializers

Rafael Espíndola via llvm-dev llvm-dev at lists.llvm.org
Wed Aug 26 08:50:54 PDT 2015


Now with the correct list.

On 26 August 2015 at 11:49, Rafael Espíndola <rafael.espindola at gmail.com> wrote:
> This is pr10368.
>
> Do we really need to support hard coded relocation numbers? Looks like
> the examples above have a representation as constant expressions:
>
>  (sub (add (ptrtoint @foo)  0xeafffffe) cur_pos)
>
> no?
>
> Why do you need to be able to avoid them showing up in function
> bodies? It would be unusual but valid to pass the above value as an
> argument to a function.
>
> Cheers,
> Rafael
>
>
>
> On 29 July 2015 at 15:44, Peter Collingbourne <peter at pcc.me.uk> wrote:
>> Hi,
>>
>> I’d like to make this proposal for extending the Constant hierarchy with
>> a mechanism for introducing custom relocations in global initializers. This
>> could also be seen as a first step towards adding a “bag-of-bytes with
>> relocations” representation for global initializers.
>>
>> Problem
>>
>> In order to implement control flow integrity for indirect function calls, we
>> would like to add a set of constructs to the IR that ultimately allow for a
>> jump table similar to that described for IFCC in [1] to be expressed. Ideally
>> the additions should be minimal and general-purpose enough to allow them to
>> be used for other purposes.
>>
>> IFCC, the previous attempt to teach LLVM to emit jump tables, was removed
>> for complicating how functions are emitted, in particular requiring a
>> subtarget-specific instruction emitter available in subtarget-independent
>> code. However, the form of a jump table entry is generally well known to
>> whichever component of the compiler is creating the jump table (for example, it
>> needs to know the size of each entry, and therefore the specific instructions
>> used), and we can therefore simplify things greatly by not considering jump
>> tables as consisting of instructions, but rather known strings of bytes in
>> the .text section with a relocation pointing to the function address. For
>> example, on x86:
>>
>> $ cat tc.ll
>> declare void @foo()
>>
>> define void @bar() {
>>   tail call void @foo()
>>   ret void
>> }
>> $ ~/src/llvm-build-rel/bin/llc -filetype=obj -o - tc.ll -O3 |~/src/llvm-build-rel/bin/llvm-objdump -d -r -
>> <stdin>:        file format ELF64-x86-64
>>
>> Disassembly of section .text:
>> bar:
>>        0:       e9 00 00 00 00  jmp     0 <bar+5>
>>                 0000000000000001:  R_X86_64_PC32        foo-4-P
>>
>>
>>
>> Or on ARM:
>>
>> $ ~/src/llvm-build-rel/bin/llc -filetype=obj -o - tc.ll -O3 -mtriple=armv7-unknown-linux |~/src/llvm-build-rel/bin/llvm-objdump -d -r -
>>
>> <stdin>:        file format ELF32-arm-little
>>
>> Disassembly of section .text:
>> bar:
>>        0:       fe ff ff ea     b       #-8 <bar>
>>                         00000000:  R_ARM_JUMP24 foo
>>
>>
>> How can we represent such jump table entries in IR? One way that almost
>> works on x86 is to attach a constant to a function using either prefix data
>> or prologue data, or to place a GlobalVariable in the .text section using
>> the section attribute. The constant would use ConstantExpr arithmetic to
>> produce the required PC32 relocation:
>>
>> define void @bar() prefix <{ i8, i32, i8, i8, i8 }> <{ i8 -23, i32 trunc (i64 add (i64 sub (i64 ptrtoint (void ()* @foo to i64), i64 ptrtoint (void ()* @bar to i64)), i64 3) to i32), i8 -52, i8 -52, i8 -52 }> {
>>   ...
>> }
>>
>> However, this is awkward, and can’t be used to represent an ARM jump table
>> entry. (It also isn’t quite right; PC32 can trigger the creation of a
>> PLT entry, which doesn’t entirely match what the ConstantExpr arithmetic
>> is doing.)
>>
>> Design
>>
>> A relocation can be seen as having three inputs: the relocation type (on
>> Mach-O this also includes a pcrel flag), the target, and the addend. So
>> let’s define a relocation constant like this:
>>
>> iNN reloc relocation_type (ptr target, iNN addend)
>>
>> where iNN is some integer type, and ptr is some pointer type. For example,
>> an ARM jump table entry might look like this:
>>
>> i32 reloc 0x1d (void ()* @foo, i32 0xeafffffe)  ; R_ARM_JUMP24 = 0x1d
>>
>> There is no error checking for this; if you use the wrong integer type for
>> a particular relocation, things will break and you get to keep both pieces.
>>
>> At the asm level, we would add a single directive, ".reloc", whose syntax
>> would look like this when targeting ELF and COFF:
>>
>> .reloc size relocation_type target addend
>>
>> or this when targeting Mach-O:
>>
>> .reloc size relocation_type pcrel target addend
>>
>> The code generator would emit this directive when emitting a reloc in a
>> constant initializer. (Note that this means that reloc constants would only
>> be supported with the integrated assembler.)
>>
>> For example, the ARM JUMP24 relocation would look like this:
>>
>> .reloc 4 0x1d foo 0xeafffffe
>>
>> We would need to add some mechanism for the assembler to evaluate relocations
>> in case the symbol is locally defined and not exported. For that reason,
>> we can start with a small set of supported "internal" relocations and expand
>> as needed.
>>
>> What about constant propagation?
>>
>> We do not want reloc constants to appear in functions' IR, or to be propagated
>> out of global initializers that use them. The simplest solution to this
>> problem is to only allow reloc constants in constant initializers where we
>> cannot/do not currently perform constant propagation, i.e. function prologue
>> data, prefix data and constants with weak linkage. This could be enforced
>> by the verifier. Later we can consider relaxing this constraint as needed.
>>
>> Other uses
>>
>> Relocation constants could be used for other purposes by frontends. For
>> example, a frontend may need to represent some other kind of custom/specific
>> instruction sequence in IR, or to create arbitrary kinds of references between
>> objects where that may be beneficial (for example, -fsanitize=function may
>> use this facility to create GOTOFF relocations in function prologues to
>> avoid creating dynamic relocations in the .text section to fix PR17633).
>>
>> Thanks,
>> --
>> Peter
>>
>> [1] http://www.pcc.me.uk/~peter/acad/usenix14.pdf


More information about the llvm-dev mailing list