[LLVMdev] LLVM GHC Backend: Tables Next To Code

Tue Feb 14 02:59:28 PST 2012

On Feb 13, 2012, at 6:49 AM, Sergiu Ivanov wrote:
> On behalf of GHC hackers, I would like to discuss the possibility of
> having a proper implementation of the tables-next-to-code optimisation
> in LLVM.

It would be great to have this.  However, the design will be tricky.  Is there anything that spells out how the TNTC optimization works at the actual machine instruction level?  It seems that there should be a blog post somewhere that shows the code with and without the optimization, but I can't find it offhand.

> This, obviously, requires certain
> ordering of data and text in the object code.  Since LLVM does not
> make it possible to explicitly control the placement of data and code,
> the necessary ordering is currently achieved by injecting GNU
> Assembler subsections on platforms supported by GNU Assembler.  Mac
> assembler, however, does not support this feature, so the resulting
> object code is post-processed directly.

It's interesting that you bring this up.  It turns out that on the mac toolchain (unless you disable subsectionsviasymbol, a gross hack) does not give you the ability to control the ordering of blobs of code separated by global labels (aka 'atoms' in the linker's terminology).  This is important because it enables link-time dead code elimination, profile based code reordering etc.  My understanding is that ELF toolchains don't have something like this, but it would be unfortunate if TNTC fundamentally prevents something like this from working.

Beyond this, the proposed model has some other issues: code ordering only makes sense within a linker section, but modeling "the table" and "the code" as two different LLVM values (a global value and a function) would mean that the optimizer will be tempted to put them into different sections, do dead code elimination, etc.

> He proposes adding a "placebefore"
> attribute to global variables (or, similarly, a "placeafter" attribute
> for functions).  The corresponding example is:

This is a non-starter for a few reasons, but that doesn't mean that there aren't other reasonable options.  I'd really like to see the codegen that you guys are after to try to help come up with another suggestion that isn't a complete one-off hack for GHC. :)

One random question: have you considered placing the table *inside* of the function?  If the prologue for the closure was effectively:

Closure:
  jmp .LAfterTable
  .word ... 
  .word ...
.LAfterTable:
  push $rbp
  ...

then you can avoid a lot of problems.  I realize that this is not going to be absolutely as fast as your current TNTC implementation, but processors are *really really* good at predicting unconditional branches, so the cost is probably minimal, and it is likely to be much much faster than not having TNTC at all.

Getting even this to work will not be fully straight-forward, but again I'd like to understand more of what you're looking for from codegen to understand what the constraints are.

-Chris