[LLVMdev] Proposal: arbitrary relocations in constant global initializers

Wed Jul 29 12:44:54 PDT 2015

Hi,

I’d like to make this proposal for extending the Constant hierarchy with
a mechanism for introducing custom relocations in global initializers. This
could also be seen as a first step towards adding a “bag-of-bytes with
relocations” representation for global initializers.

Problem

In order to implement control flow integrity for indirect function calls, we
would like to add a set of constructs to the IR that ultimately allow for a
jump table similar to that described for IFCC in [1] to be expressed. Ideally
the additions should be minimal and general-purpose enough to allow them to
be used for other purposes.

IFCC, the previous attempt to teach LLVM to emit jump tables, was removed
for complicating how functions are emitted, in particular requiring a
subtarget-specific instruction emitter available in subtarget-independent
code. However, the form of a jump table entry is generally well known to
whichever component of the compiler is creating the jump table (for example, it
needs to know the size of each entry, and therefore the specific instructions
used), and we can therefore simplify things greatly by not considering jump
tables as consisting of instructions, but rather known strings of bytes in
the .text section with a relocation pointing to the function address. For
example, on x86:

$ cat tc.ll
declare void @foo()

define void @bar() {
  tail call void @foo()
  ret void
}
$ ~/src/llvm-build-rel/bin/llc -filetype=obj -o - tc.ll -O3 |~/src/llvm-build-rel/bin/llvm-objdump -d -r -
<stdin>:	file format ELF64-x86-64

Disassembly of section .text:
bar:
       0:	e9 00 00 00 00 	jmp	0 <bar+5>
		0000000000000001:  R_X86_64_PC32	foo-4-P

Or on ARM:

$ ~/src/llvm-build-rel/bin/llc -filetype=obj -o - tc.ll -O3 -mtriple=armv7-unknown-linux |~/src/llvm-build-rel/bin/llvm-objdump -d -r -

<stdin>:	file format ELF32-arm-little

Disassembly of section .text:
bar:
       0:	fe ff ff ea 	b	#-8 <bar>
			00000000:  R_ARM_JUMP24	foo

How can we represent such jump table entries in IR? One way that almost
works on x86 is to attach a constant to a function using either prefix data
or prologue data, or to place a GlobalVariable in the .text section using
the section attribute. The constant would use ConstantExpr arithmetic to
produce the required PC32 relocation:

define void @bar() prefix <{ i8, i32, i8, i8, i8 }> <{ i8 -23, i32 trunc (i64 add (i64 sub (i64 ptrtoint (void ()* @foo to i64), i64 ptrtoint (void ()* @bar to i64)), i64 3) to i32), i8 -52, i8 -52, i8 -52 }> {
  ...
}

However, this is awkward, and can’t be used to represent an ARM jump table
entry. (It also isn’t quite right; PC32 can trigger the creation of a
PLT entry, which doesn’t entirely match what the ConstantExpr arithmetic
is doing.)

Design

A relocation can be seen as having three inputs: the relocation type (on
Mach-O this also includes a pcrel flag), the target, and the addend. So
let’s define a relocation constant like this:

iNN reloc relocation_type (ptr target, iNN addend)

where iNN is some integer type, and ptr is some pointer type. For example,
an ARM jump table entry might look like this:

i32 reloc 0x1d (void ()* @foo, i32 0xeafffffe)  ; R_ARM_JUMP24 = 0x1d

There is no error checking for this; if you use the wrong integer type for
a particular relocation, things will break and you get to keep both pieces.

At the asm level, we would add a single directive, ".reloc", whose syntax
would look like this when targeting ELF and COFF:

.reloc size relocation_type target addend

or this when targeting Mach-O:

.reloc size relocation_type pcrel target addend

The code generator would emit this directive when emitting a reloc in a
constant initializer. (Note that this means that reloc constants would only
be supported with the integrated assembler.)

For example, the ARM JUMP24 relocation would look like this:

.reloc 4 0x1d foo 0xeafffffe

We would need to add some mechanism for the assembler to evaluate relocations
in case the symbol is locally defined and not exported. For that reason,
we can start with a small set of supported "internal" relocations and expand
as needed.

What about constant propagation?

We do not want reloc constants to appear in functions' IR, or to be propagated
out of global initializers that use them. The simplest solution to this
problem is to only allow reloc constants in constant initializers where we
cannot/do not currently perform constant propagation, i.e. function prologue
data, prefix data and constants with weak linkage. This could be enforced
by the verifier. Later we can consider relaxing this constraint as needed.

Other uses

Relocation constants could be used for other purposes by frontends. For
example, a frontend may need to represent some other kind of custom/specific
instruction sequence in IR, or to create arbitrary kinds of references between
objects where that may be beneficial (for example, -fsanitize=function may
use this facility to create GOTOFF relocations in function prologues to
avoid creating dynamic relocations in the .text section to fix PR17633).

Thanks,
-- 
Peter

[1] http://www.pcc.me.uk/~peter/acad/usenix14.pdf