[LLVMdev] Header in bitcode format 3.0?

Mon May 9 13:41:35 PDT 2011

On 9 May 2011 20:56, Samuel Crow <samuraileumas at yahoo.com> wrote:
> In the past I've worked on a PEG parser generator for any LLVM-based language to use.  One obstacle we ran into when generating LLVM IR assembly was that we'd end up cutting and pasting a list of declarations and aliases into every .ll file that needed to link with the others.  I'd propose that in the Bitcode 3.0 format, that a header definition be added to the IR assembly format using a FoldingSet to make sure that only unique headers are fetched recursively.  This would be primarily useful for making the bitcode a true virtual machine instead of just a pure intermediate representation of code written in other languages.

Hi Samuel,

That seems an interesting concept, more or less like guarded headers
in C or import mechanisms in Java or Python. However, that goes a bit
off the track with regards to IR.

The more semantics you add to IR, the more complex the middle-end
needs to be to deal with the idiosyncrasies and the less powerful is
the compiler. I for one always welcome changes in the IR regarding
readability and correct representation. I have proposed to (and failed
to convince) the list on a few modifications (unions type, more
complex structures, bit-fields, meta-attributes, sub-target
properties, etc) and I failed to convince why it would be worthy: IR
has to work with any language and any target.

Note that that's not the same as to say that *the same* IR has to work
across languages and targets (as I originally thought). LLVM fails to
accomplish that, and it's clear how that hinders PNaCl's model.

But adding more semantics to your particular problem will complicate
things for others. Ultimately, I think there are only two feasible
approaches to change the IR (except clear representation changes, like
exception handling and debug information):

1. Domain-specific wrappers: In your case, having a domain-specific
header-engine would enable you to distribute pieces of non-functional
IR and grab them via this mechanism to join in the target and execute
code in a less cumbersome way. That is less than ideal, but it doesn't
push deployment capabilities to a clearly focused intermediate
representation, and provide you with the state-machine you wanted.

2. Meta-IR: The IR that is compiled by the back-end doesn't
necessarily need to be the one your front-end generates. You can have
some meta-features (not metadata) on your IR in the form of
instructions that you think it's the right way of doing things, but
don't work. Than, just before the middle-end starts, you create a few
*correction* passes, that understands your meta-features and transform
the IR into a less-readable, less-semantic IR that the middle-end and
the back-end understand.

You can do both, and we have considered (but not implemented) the
second approach for our front-end representation. For now, we generate
the same thing as Clang and llvm-gcc, which is less than ideal.

One example is the struct byval. The ARM back-end still doesn't
support struct byval (maybe now it does, I was away for a while), but
it does implement array byval. So we had to convert every structure
into an array pointer, changing the signature of every struct byval
function and you can guess the delicate relationship with other
modules and so on. C++ strucures, bit-fields and unions provide a
plethora of examples for messing up the IR representation, so all C++
front-ends could benefit from that pre-middle-end pass.

In essence, I'm proposing to wrap semantics around the IR, because
that gives you the freedom to implement your functionality without
loss of semantics, but that also probably means the IR will never grow
out of its scope. But the more I think of it, the more it makes sense
not to. It's the same as run-time optimisation, once you had done it
to one machine, it doesn't make sense to transport that IR to another
machine, even if it's of the same type, because its use of it will be
different.

I hope not to have created more doubts, but that more or less answer
why the IR hasn't changed much for a while. Now, for the costs of
keeping a third party wrapper, it depends. I'd try to upstream the
wrapper (maybe as a plugin) rather than try to change the IR
structure.

My tuppence.

cheers,
--renato