[LLVMdev] Sharing high-level type information between modules

Wed Jan 23 18:08:38 PST 2008

More questions about writing a high-level language:

Let's say we have a "headerless" language such as Java or C#, where the 
compiler is able to get high-level type information from the compiled 
object file rather than via header files as used in C/C++.

So for example, if I have a module that imports symbols from some other 
modules, the compiler would check to see if those modules need to be 
recompiled. If so, those modules would be parsed and added to the queue 
for compilation, otherwise it would simply parse the output file (which 
presumably is faster than reparsing & recompiling the source).

What I'm envisioning is a compiler which converts each module into an 
intermediate form, containing both the LLVM bitcode and a compressed 
version of the high-level types. This includes inline methods and 
templates, meaning that the contents of a module might affect the 
compilation of modules that imports it. (Dealing with circular 
dependencies will be interesting but not unsolvable.) Each compiled 
module would also contain a list of other modules that it depends on, 
including a hash of the imported module's content, so that it would be 
relatively easy to calculate when a module needs to be rebuilt.

These intermediate forms are then combined with a linker to a native 
binary, in which all of the high-level type information is stripped out 
(except in debug builds).

So the first question I have is, do I need to come up with my own 
container format for LLVM bitcode, or is there a way to store the type / 
dependency information in the existing format? Or would it be better to 
have two output files per source file, one with the LLVM bitcode, and 
one with the high-level type information? This latter strategy would 
allow the stock LLVM linker to be used to create the final application.

Also, any ideas or comments on the general subject would be welcome.

-- Talin