[LLVMdev] Packages

Sun Nov 16 18:59:02 PST 2003

> The point here is that XPL needs to keep track of what a given variable
> represents at the source level. If the compiler sees a map that is
> initially small it might represent it in LLVM assembly as a vector of
> pairs. Later on, it gets optimized into being a hash table. In order to
> do that and keep track of things, I need to know that the vector of
> pairs is >intended< to be a map, not simply a vector of pairs.

Absolutely.  No matter what source language you're interested in, you want
to know about _source_ variables/types/etc, not about LLVM varaibles,
types, etc.

> Another reason to do this is to speed up compilation time. XPL works
> similarly to Java in that you define a module and "import" other modules
> into it.  I do not want to recompile a module each time it is imported.

Makes sense . On the LLVM side of the fence, we are planning on making the
JIT cache native translations, so you only need to pay the translation
cost the first time a function is executed.  This is also plays into the
'offline compilation' idea as well.

> Since finding LLVM, I'm wondering if it wouldn't be better to store all
> the AST information in the bytecode file so that I don't have
> compilation information in one place and the code for it in another.
> To do this, I'd need support from LLVM to put "compile time information"
> into a bytecode or assembly file. This information would never be used
> at runtime and never "optimized out". It just sits in the bytecode file
> taking up space until some compiler (or other tool) asks for it.

Makes sense.   The LLVM bytecode file is packetized to specifically
support these kinds of applications.  The bytecode reader can skip over
sections it doesn't understand.  The unimplemented part is figuring out a
format to put this into the .ll file (probably just a hex dump or
something), and having the compiler preserve it through optimization.

>      5. Compile time information is defined as a set of global variables
>         just the same as for the runtime definitions. The full use of
>         LLVM Types (especially derived types like structures and
>         pointers) can be used to define the global variables.

If you just want to do this _today_ you already can.  We have an
"appending" linkage type which can make this very simple.  Basically
global arrays with appending linkage automatically merge together when
bytecode files are linked (just like 'section' are merged in a traditional
linker).  If you want to implement your extra information using globals,
that is no problem, they will just always be loaded and processed.

>      6. There are never any naming conflicts between compile time
>         information variables in different modules. Each compile time
>         global variable is, effectively, scoped in its module. This
>         allows compiler writers to use the same name for various pieces
>         of data in every module emitted without clashing.

If you use the appending linkage mechanism, you _want_ them to have the
same name. :)

>      7. The exact same facility for dealing with module scoped types and
>         variables are used to deal with the compile time information.
>         When asked for it, the VMCore would produce a SymbolTable that
>         references all the global types and variables in the compile
>         time information.

If you use globals directly, you can just use the standard stuff.

>      8. LLVM assembler and bytecode reader will assure the syntactic
>         integrity of the compile time information as it would for any
>         other bytecode. It checks types, pointer references, etc. and
>         emits warnings (errors?) if the compiler information is not
>         syntactically valid.

How does it do this if it doesn't understand it?  I thought it would just
pass it through unmodified?

>      9. LLVM makes no assertions about the semantics or content of the
>         compile time information. It can be anything the compiler writer
>         wishes to express to retain compilation information. Correctness
>         of the information content (beyond syntactics) is left to the
>         compiler writer.  Exceptions to this rule may be warranted where

This seems to contradict #8.

>         there is general applicability to multiple source languages.
>         Debug (file & line number) info would seem to be a natural
>         exception.

Note that debug information doesn't work with this model.  In particular,
when the LLVM optimizer transmogrifies the code, it has to update the
debug information to remain accurate.  This requires understanding (at
some level) the debug format.

>     10. Compile time information sections are marked with a name that
>         relates to the high-level compiler that produced them. This
>         avoids confusion when one language attempts to read the compile
>         time information of another language.
>
> This is somewhat like an open ended, generalized ELF section for keeping
> track of compiler and/or debug information.  Because its based on
> existing capabilities of LLVM, I don't think it would be particularly
> difficult to implement either.

There are two ways to implement this, as described above:
  1. Use global arrays of bytes or something.  If you want to, your arrays
     can even have pointers to globals variables and functions in them.
  2. Use an untyped blob of data, attached to the .bc file.

#2 is better from the efficiency standpoint (it doesn't need to be loaded
if not used), but #1 is already fully implemented (it is used to implement
global ctor/dtors)...

-Chris

-- 
http://llvm.cs.uiuc.edu/
http://www.nondot.org/~sabre/Projects/