[LLVMdev] Proposal: function prefix data

Wed Jul 17 18:06:09 PDT 2013

Hi,

I would like to propose that we introduce a mechanism in IR to allow
arbitrary data to be stashed before a function body.  The purpose of
this would be to allow additional data about a function to be looked
up via a function pointer.  Two use cases come to mind:

1) We'd like to be able to use UBSan to check that the type of the
   function pointer of an indirect function call matches the type of
   the function being called.  This can't really be done efficiently
   without storing type information near the function.

2) Allowing GHC's tables-next-to-code ABI [1] to be implemented.
   In general, I imagine this feature could be useful for the
   implementation of languages which require runtime metadata for
   each function.

The proposal is that an IR function definition acquires a constant
operand which contains the data to be emitted immediately before
the function body (known as the prefix data).  To access the data
for a given function, a program may bitcast the function pointer to
a pointer to the constant's type.  This implies that the IR symbol
points to the start of the prefix data.

To maintain the semantics of ordinary function calls, the prefix data
must have a particular format.  Specifically, it must begin with a
sequence of bytes which decode to a sequence of machine instructions,
valid for the module's target, which transfer control to the point
immediately succeeding the prefix data, without performing any other
visible action.  This allows the inliner and other passes to reason
about the semantics of the function definition without needing to
reason about the prefix data.  Obviously this makes the format of the
prefix data highly target dependent.

This requirement could be relaxed when combined with my earlier symbol
offset proposal [2] as applied to functions.  However, this is outside
the scope of the current proposal.

Example:

%0 = type <{ i32, i8* }>

define void @f() prefix %0 <{ i32 1413876459, i8* bitcast ({ i8*, i8* }* @_ZTIFvvE to i8*) }> {
  ret void
}

This is an example of something that UBSan might generate on an
x86_64 machine.  It consists of a signature of 4 bytes followed by a
pointer to the RTTI data for the type 'void ()'.  The signature when
laid out as a little endian 32-bit integer decodes to the instruction
'jmp .+0x0c' (which jumps to the instruction immediately succeeding
the 12-byte prefix) followed by the bytes 'F' and 'T' which identify
the prefix as a UBSan function type prefix.

A caller might check that a given function pointer has a valid signature
like this:

  %4 = bitcast void ()* @f to %0*
  %5 = getelementptr %0* %4, i32 0, i32 0
  %6 = load i32* %5
  %7 = icmp eq i32 %6, 1413876459

In the specific case above, where the function pointer is a constant,
optimisation passes such as globalopt could potentially be adapted
to recognise prefix data and hence replace %6 etc with a constant.
(This is one reason why I decided to represent prefix data in IR
rather than, say, using inline asm as proposed in the GHC thread [1].)

Thoughts?

Thanks,
-- 
Peter

[1] http://lists.cs.uiuc.edu/pipermail/llvmdev/2012-February/047550.html
[2] http://lists.cs.uiuc.edu/pipermail/llvmdev/2013-April/061511.html