[LLVMdev] Proposal: function prefix data

Thu Jul 18 12:14:59 PDT 2013

On Thu, Jul 18, 2013 at 12:45:32PM -0400, Jevin Sweval wrote:
> On Wed, Jul 17, 2013 at 9:06 PM, Peter Collingbourne <peter at pcc.me.uk> wrote:
> >
> > To maintain the semantics of ordinary function calls, the prefix data
> > must have a particular format.  Specifically, it must begin with a
> > sequence of bytes which decode to a sequence of machine instructions,
> > valid for the module's target, which transfer control to the point
> > immediately succeeding the prefix data, without performing any other
> > visible action.  This allows the inliner and other passes to reason
> > about the semantics of the function definition without needing to
> > reason about the prefix data.  Obviously this makes the format of the
> > prefix data highly target dependent.
> 
> 
> What if the prefix data was stored before the start of the function
> code? The function's symbol will point to the code just as before,
> eliminating the need to have instructions that skip the prefix data.
> 
> It would look something like:
> | Prefix Data ... (variable length) | Prefix Data Length (fixed length
> [32 bits?]) | Function code .... |
> 
>             ^ function symbol points here (function code)
> 
> I hope the simple ASCII art makes it through my mail client.
> 
> To access the data, you do
> 
> prefix_data = function_ptr - sizeof(prefix_length) - prefix_length

A similar scheme is described in the next paragraph of my email:

> > This requirement could be relaxed when combined with my earlier symbol
> > offset proposal [2] as applied to functions.  However, this is outside
> > the scope of the current proposal.

Unfortunately, this won't work for UBSan, as it needs to be able to
take an arbitrary function pointer and determine whether the prefix
data is present.  If the function lives at the beginning of a segment
boundary and does not have prefix data a segfault may occur when
attempting to access the prefix data.

It should definitely work for GHC though (and is how I understand
the tables-next-to-code ABI to be implemented in its non-LLVM backend).

Thanks,
-- 
Peter