[LLVMdev] [PATCH] Symbol offsets

Tue May 27 17:48:32 PDT 2014

I'm a little concerned we got prefix data wrong.  We had the following
motivating use cases:

1. Function prologue sigils, where we emit a special nop slide, maybe with
data in it.  Peter implemented a ubsan feature using this.

2. Function hotpatching, where we emit some data before the function and a
special nop before the function.  Typically the nop is 'mov edi, edi' on
x86 Windows, preceded by five bytes of padding for a long jump.  Profilers
can uses this to turn on and off instrumentation of a running binary.

3. Tables-before-code, where data is completely prior to the code.  GHC
needs this.

In all cases, any code inside the prologue had no meaning to LLVM.
 Inlining a function with a funky prologue is completely valid.

I worry that symbol_offset combined with prefix are too low-level.  What if
we split this up into something like prefix data "prologue" data?  Prefix
data would be an arbitrary LLVM constant, and prologue data is a byte
sequence of native executable code.  Something like:

define void @foo() prefix [i8* x 2] { i8* @a, i8* @b } prologue [i8 x 4]
c"\xde\xad\xbe\xef" { ret void }

I think the two forms are fundamentally equivalent to optimizations like
global constant propagation, but it'd be nice to have an intuitive
representation.  One of the strengths of LLVM's IL is that it's
comprehensible to mere mortal compiler engineers, and not just computer
programs.

---

P.S. You could also represent this with aliases with a non-zero offset from
the beginning of the function.  Rafael is implementing this, but I don't
think that's a very good representation.  What does it mean to inline
through an alias to a function with a non-zero offset?  We could say that
we just ignore the offset for analysis purposes, but it doesn't feel very
clean.

On Tue, May 27, 2014 at 6:13 AM, Ben Gamari <bgamari.foss at gmail.com> wrote:

>
> Somehow this cover letter was dropped from my symbol offsets patch set:
>
>   1. http://lists.cs.uiuc.edu/pipermail/llvmdev/2014-May/073200.html
>   2. http://lists.cs.uiuc.edu/pipermail/llvmdev/2014-May/073201.html
>
>
> Original message
> -----------------
>
> About a year ago a proposal suggesting symbol offsets was brought to
> this list[1]. This proposal goes hand-in-hand with the prefix data
> proposal[2] which has now been implemented and I believe both of these
> arose in part due to GHC's requirement to place its info tables before
> symbol definitions[3]. Unfortunately, the current implementation of
> prefix data isn't terribly useful to GHC without symbol offsets[4,5]
>
> This weekend I implemented option (2) in the original proposal, then
> eventually implemented option (1) on top of this. Here is the
> result. Note that this can also be found on Github[6] for those who
> prefer this.
>
> A review would be greatly appreciated. One known deficiency of this set
> is the lack of tests. Unfortunately, due to the use of temporary symbols
> it's not clear to me how this feature can be reliably tested. Ideas
> are welcome.
>
> Cheers,
>
> - Ben
>
>
> [1] http://lists.cs.uiuc.edu/pipermail/llvmdev/2013-April/061511.html
> [2] http://lists.cs.uiuc.edu/pipermail/llvmdev/2013-July/063909.html
> [3] http://lists.cs.uiuc.edu/pipermail/llvmdev/2012-February/047514.html
> [4] http://www.haskell.org/pipermail/ghc-devs/2013-September/002565.html
> [5] https://ghc.haskell.org/trac/ghc/ticket/4213#comment:12
> [6] https://github.com/bgamari/llvm/compare/symbol-offset
>
>
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140527/6b4b3974/attachment.html>