[LLVMdev] Adding support to LLVM for data & code layout (needed by GHC)

Tue Jun 15 06:18:28 PDT 2010

Hi all,

Just wanted to report that I've found a second way to achieve
data/code layout (the first being the linker script that Eugene
mentioned).

The key is that gnu as supports a feature called subsections.

http://sourceware.org/binutils/docs-2.20/as/Sub_002dSections.html#Sub_002dSections

The way this works is that you can put stuff into a section like
'.text 2', where 2 is a subsection of .text When run, 'as' orders the
subsections. So all you need to do is arrange for the sidetable to be
in section '.text n' and the code in section '.text n+1'. Each
sidetable and its code goes in its own subsection. The nice thing is,
this is purely a gnu as feature. When it compiles the assembly to
object code, the subsections aren't present in the object code, so you
don't get 100's of sections that take up space and slow down linking.

There is one complication though. LLVM (and GCC as well) don't support
subsections. While you can define what section globals and functions
are in, this doesn't support defining the subsection. If you say to
LLVM, put function f in section "text 12", it produces assembly like:

.section text 12,"rw" @progbits
f:
 [..]

Which causes gas to spit out a syntax error. Gas only allows using
subsections through a very defined syntax, so it needs to be:

.text 12
f:
  [...]

We can convert between them though with just a simple regex.

We are going to use this approach for the moment in GHC, we've tested
it and its working great so far. I prefer this method over the linker
script as implementing the linker script approach would affect all the
backends GHC supports while this approach is contained to the LLVM
backend.

I'm still planning on adding support to LLVM for supporting side
tables in some manner so we can just depend on pure LLVM.

Cheers,
David

On 10 June 2010 18:08, Andrew Lenharth <andrewl at lenharth.org> wrote:
> On Thu, Jun 10, 2010 at 11:34 AM, David Terei <davidterei at gmail.com> wrote:
>> Its good to see that a feature of this nature would be useful to a
>> whole range of people, I wasn't aware of that.
>>
>> On 9 June 2010 22:40, Andrew Lenharth <andrewl at lenharth.org> wrote:
>>> My argument amounts to express side tables as side tables in the IR
>>> rather than as an ordering on globals.  I think that would simplify
>>> the backend (a side table is something you discover form the function
>>> rather than having to check another global).  Also, if well specified,
>>> I think you could allow basic block labels into structures which makes
>>> them more interesting for other uses.
>>
>> Sure. I wasn't set on the third approach I suggested, which is to have
>> them expressed as side tables in the IR as I didn't realise other
>> users would be interested in them so I didn't think it would be
>> appropriate to add new language constructs for one user. I don't think
>> it would simpler to implement in the backend though and this approach
>> would need changes to the frontend, so a lot more work.
>
> The backend already can sort of do this with the GCMetadataPrinter.
> Generalizing that to arbitrary side tables might be easier than adding
> a new construct (granted sidetables might not replace the ability to
> output assembly by that class, but they might do a lot of the heavy
> lifting).  Since GC lowering happens on the IR level (from the docs I
> looked at, I haven't personally dealt with GC yet), it maybe possible
> to do a lot of lowering to generalized tables rather than complex
> GCMetadataPrinter implementations.  This is just speculation on my
> part though.  This is one of the reasons I thought labels in the
> constant structs could be handy.  Perhaps a general side table
> representation in the backend could be used by EH too?
>
> Andrew
>
>> What I am hoping someone may be able to give a answer to though is
>> what issues there may be if the second approach was taken (using the
>> special glob var)? Would the optimiser be tempted at some point to
>> replace a load instruction to an unknown address created by a negative
>> offset from a function with unreachable for example as Eugene
>> suggested may be possible?
>>
>> Also, what are you gaining going with the third approach? I guess the
>> optimiser could do things like constant propogation using the third
>> approach but not the second although I think thats unlikely do give
>> much benefit in the kind of code GHC produces but there is everyone
>> else to think of :).
>>
>> Thanks for all the responses though, I'm going to start playing around
>> with some code and see what happens.
>>
>