[LLVMdev] Proposal: global symbol offsets

Peter Collingbourne peter at pcc.me.uk
Wed Apr 24 00:44:33 PDT 2013


Hi,

I'd like to propose that we introduce a mechanism in LLVM for
declaring that the symbol for a given global variable should be
assembled at a given offset from the start of the data for that global.
The main reason for doing this would be to allow a module to conform to
an externally imposed ABI which requires data to be present below
a symbol.  We have two specific use cases in mind at the moment:

1) The Microsoft C++ ABI, whose vtable symbols point to the first
   virtual function.  The pointer to the RTTI data for the symbol
   resides at one pointer width below the first virtual function pointer.

2) Allowing memory safety tools such as Address Sanitizer to emit a redzone
   before each global variable while allowing the external symbols to be
   used by uninstrumented code.

Below I've listed five design alternatives I considered with their pros
and cons: 

------------------------------------------------------------------------
(1) Offset a constant:
@vt = linkonce_odr global [3 x i8*] [i8* @rtti, i8* @f1, i8* @f2], symbol_offset i64 ptrtoint (i8** getelementptr ([3 x i8*]* null, i32 0, i32 1))
or just
@vt = linkonce_odr global [3 x i8*] [i8* @rtti, i8* @f1, i8* @f2], symbol_offset i64 8

In the object file, the symbol (in this case, "vt") would be emitted
at symbol_offset bytes past the start of vt's data.  At the IR level,
@vt would work like a regular global reference (i.e. it would refer
to the start of vt's data, i.e. the symbol "vt" minus symbol_offset).
symbol_offset must be a constant with an absolute value when taking
into account data layout.  The symbol_offset attribute could also be
added to external references and would generally be expected to be
consistent across modules to be linked (although the IR linker would
fix up discrepancies).

Pros:
- Can be used in portable IR
- Can express any arbitrary offset

Cons:
- Rather long winded

(2) Offset an integer:
@vt = linkonce_odr global [3 x i8*] [i8* @rtti, i8* @f1, i8* @f2], symbol_offset 8

Similar to (1), except this time symbol_offset may only be an integer
in the IR.

Pros:
- Easiest to implement
- Can express an arbitrary offset
- Avoids taking up additional space in the GlobalVariable class, as we
  can just introduce an additional bitfield.

Cons:
- Not useful in portable IR -- but then again, if you're using this
  attribute you're probably doing something nonportable anyway.

(3) Offset a list of GEP parameters:
@vt = linkonce_odr global [3 x i8*] [i8* @rtti, i8* @f1, i8* @f2], symbol_offset (i32 0, i32 1)

Similar to (1) but the symbol position for @vt is evaluated as
though @vt were the first operand of a getelementptr constant, and
the symbol_offset operands were subsequent operands.

Pros:
- Can be used in portable IR

Cons:
- Yet another GEP-like value to deal with (c.f. GEP, insertvalue, extractvalue)
- Can't easily express arbitrary offsets 

(4) Symbol an operand:
@vt = linkonce_odr global [3 x i8*] [i8* @rtti, i8* @f1, i8* @f2], symbol i8** getelementptr ([3 x i8*]* @vt, i32 0, i32 1)

Similar to (3) except the getelementptr constant is explicit in the IR.
Or it could be a bitcast, inttoptr/add/ptrtoint etc.

Pros:
- Symbol operand looks "natural" (and can be used directly to get a
  symbol reference)
- Can express an arbitrary offset

Cons:
- Requires a circular reference, which could make construction slightly
  tricky
- We'd need to be careful about what kind of constants are permitted
  in the symbol operand, to ensure that the operation can be inverted
  to derive the IR-level symbol from the object-level symbol.

(5) Aliases:
@0 = linkonce_odr global [3 x i8*] [i8* @rtti, i8* @f1, i8* @f2]
@vt = linkonce_odr alias i8** getelementptr ([3 x i8*]* @0, i32 0, i32 1)

Pretty obvious what this is supposed to do.  The main novelty is the
nonzero operand in the gep.

Pros:
- Avoids extending the IR syntax (by much).

Cons:
- Getting offsetted aliases to behave like a global with a given
  linkage would be tricky, especially for COFF and linkonce_odr.
  (Consider: the COFF specfication states that only one symbol per
  COMDAT section may have the magical linkonce properties.)  Further,
  the use of an alias to represent an offset is unprecedented (at the
  very least something like http://llvm-reviews.chandlerc.com/D701 is
  required), and it's not clear that all optimisers will respect this.
------------------------------------------------------------------------

I'm inclined to go with (2) or maybe (3).  Either would give us the
flexibility to upgrade to (1) if needed.

Thanks,
-- 
Peter



More information about the llvm-dev mailing list