[LLVMdev] Header in bitcode format 3.0?

Tue May 31 10:21:54 PDT 2011

>That seems an interesting concept, more or less like guarded headers 
in C or import mechanisms in Java or Python. However, that goes a bit 
off the track with regards to IR. 

>The more semantics you add to IR, the more complex the middle-end 
needs to be to deal with the idiosyncrasies and the less powerful is 
the compiler. I for one always welcome changes in the IR regarding 
readability and correct representation. I have proposed to (and failed 
to convince) the list on a few modifications (unions type, more 
complex structures, bit-fields, meta-attributes, sub-target 
properties, etc) and I failed to convince why it would be worthy: IR 
has to work with any language and any target. 

>Note that that's not the same as to say that *the same* IR has to work 
across languages and targets (as I originally thought). LLVM fails to 
accomplish that, and it's clear how that hinders PNaCl's model. 

>But adding more semantics to your particular problem will complicate 
things for others. Ultimately, I think there are only two feasible 
approaches to change the IR (except clear representation changes, like 
exception handling and debug information): 

>1. Domain-specific wrappers: In your case, having a domain-specific 
header-engine would enable you to distribute pieces of non-functional 
IR and grab them via this mechanism to join in the target and execute 
code in a less cumbersome way. That is less than ideal, but it doesn't 
push deployment capabilities to a clearly focused intermediate 
representation, and provide you with the state-machine you wanted. 

>2. Meta-IR: The IR that is compiled by the back-end doesn't 
necessarily need to be the one your front-end generates. You can have 
some meta-features (not metadata) on your IR in the form of 
instructions that you think it's the right way of doing things, but 
don't work. Than, just before the middle-end starts, you create a few 
*correction* passes, that understands your meta-features and transform 
the IR into a less-readable, less-semantic IR that the middle-end and 
the back-end understand. 

>You can do both, and we have considered (but not implemented) the 
second approach for our front-end representation. For now, we generate 
the same thing as Clang and llvm-gcc, which is less than ideal. 

--snip--

>In essence, I'm proposing to wrap semantics around the IR, because 
that gives you the freedom to implement your functionality without 
loss of semantics, but that also probably means the IR will never grow 
out of its scope. But the more I think of it, the more it makes sense 
not to. It's the same as run-time optimisation, once you had done it 
to one machine, it doesn't make sense to transport that IR to another 
machine, even if it's of the same type, because its use of it will be 
different. 

>I hope not to have created more doubts, but that more or less answer 
why the IR hasn't changed much for a while. Now, for the costs of 
keeping a third party wrapper, it depends. I'd try to upstream the 
wrapper (maybe as a plugin) rather than try to change the IR 
structure. 

>My tuppence. 

>cheers, 
>--renato

Hello Renato,

I've given some long, hard thought into how I would like to add the header block into the bitcode format.  The bitcode format already has 6-bit character string support.  I would like to add an optional block to the bitcode format similar to the MODULE_CODE_DEPLIB record format that would contain a list of 6-bit string records containing the filenames of the headers to be loaded in by the bitcode reader.  The loader would read the block and add all of the filenames to a folding set and, once recursion into all of the files is complete, linking into the module would be enacted.

Since it would be an optional block, all of the existing bitcodes would be unaffected and likewise, only new code would contain the new block structure and even then, it would only appear if it is used by the frontend.  I want this to be an unobtrusive change, after all.  As far as writing my own wrapper, I'd just as soon not have to reinvent the LLVM Assembly parser just to add one optional keyword at the beginning followed by a comma-separated list of filenames.

Some of the other functions you mentioned in your message such as unions, and so on, would make the IR a high-level language thus defeating the purpose of calling it low-level.  My proposal is different in that I'm just trying to make it compete with other system-specific Assembly language packages.  I do, however, intend to make it functional as a virtual machine to the point of being able to run bitcode on multiple platforms.  It has already been tested to work.  Perhaps the LLVM Wrapper project might not take off but nonetheless, it's very frustrating to have the LLVM-as program not have any sort of header include functionality.

My purpose here isn't to make more work for everyone.  Just the opposite.  Nonetheless, I'd like to know if my patches will be rejected on these merits Renato mentioned, or if I may add my series of patches to make this work.

My own two cents worth,

--Sam Crow