[LLVMdev] 9 Ideas To Better Support Source Language Developers

Wed Jan 7 12:55:01 PST 2004

On Tue, 6 Jan 2004, Reid Spencer wrote:
> A while back I promised to provide some feedback on useful extensions to
> LLVM to better support source language writers (i.e. those _using_ LLVM,
> not developing it). Below is a list of the ideas I've come up with so
> far.

Cool!  Ideas are alway welcome!

> If you respond to this, please respond to each item in a separate
> message to the list.  That way we can keep track of different topics on
> different discussion threads.

I'll let you split it up as you see fit.  :)

> ------------------------------------------------------------------
> 1. Definition Import
> Source languages are likely to create lots of named type and value
> definitions for the memory objects the language manipulates. Redefining
> these in every module produces byte code bloat. It would be very useful
> for LLVM to natively support some kind of import capability that would
> properly declare global types and global values into the module being
> considered.

Unfortunately, this would break the ability to take a random LLVM bytecode
file and use it in a self-contained way.  In general, the type names and
external declarations are actually stored very compactly, and the
optimizers remove unused ones.  Is this really a problem for you in
practice?

> Even better would be a way to have this capability supported
> as a first-class citizen with some kind of "Import" class and/or
> instruction: simply point an Import class to an existing bytecode file
> and it causes the global declarations from that bytecode file to be
> imported into the current Module.

We already have this: the linker.  :)   Just put whatever you want into an
LLVM bytecode file, then use the LinkModules method (from
llvm/Transforms/Utils/Linker.h) to "import" it.  Alternatively, in your
front-end, you could just start with this module instead of an empty one
when you compile a file...

> ------------------------------------------------------------------
> 2. Memory Management
>
> My programming system (XPS) has some very powerful and efficient memory
> allocation mechanisms built into it. It would be useful to allow users
> of LLVM to control how (and more importantly where) memory is allocated
> by LLVM.

What exactly would this be used for?  Custom allocators for performance?
Or something more important?  In general, custom allocators for
performance are actually a bad idea...

> ------------------------------------------------------------------
> 3. Code Signing Support
>
> One of the requirements for XPL is that the author and/or distributor of
> a piece of software be known before execution and that there is a way to
> validate the integrity of the bytecodes.  To that end, I'm planning on
> providing message digesting and signing on LLVM bytecode files. This is
> pretty straight forward to implement. The only question is whether it
> really belongs in LLVM or not.

I don't think that this really belongs in LLVM itself: Better would be to
wrap LLVM bytecode files in an application (ie, XPL) specific file format
that includes the digest of the module, the bytecode itself, and whatever
else you wanted to keep with it.  That way your tool, when commanded to
load a file, would check the digest, and only if it matches call the LLVM
bytecode loader.

> There's one issue with code signing: it thwart's global optimization
> because changing the byte code means changing the signature.  While the
> software's author can always do this, a signed bytecode file could not
> be globally optimized into another program without breaking the
> signature.  It would probably be acceptable to allow LLVM to modify the
> bytecode in memory at runtime after de-encryption and verification of
> the signature.

I'm not sure that there is a wonderful solution to this.  You could go the
route of having a "trusted" compiler, which has the necessary keys built
into it or something, but I don't know very much about this area.

> ------------------------------------------------------------------
> 4. Threading Support
>
> Some low level support for threading is needed. I think there are really
> just a very few primitives we need from which higher order things can be
> constructed. One is a memory barrier to ensure cache is flushed, etc. so
> we can be certain a write to memory has "taken".

Just out of curiousity, what do you need a membar for?  The only thing
that I'm aware of it being useful for (besides implementing threading
packages) are Read-Copy-Update algorithms.

> This goes beyond the current volatile support and will need to access
> specific machine instructions if a native barrier is supported. Another
> is a thread forking instruction. I'd like to see TLS supported but that
> can probably be constructed from lower level primitives.  A nice-to-have
> would be critical section support. This could be done similar to java's
> monitorenter and monitorexit instructions.  If I recall correctly, I
> believe this capability is being worked on currently.

Yup, Misha is currently working on adding these capabilities to LLVM.  In
the meantime, calling into a pthreads library directly is the preferred
solution.  I agree that TLS would be very handy to have.

> ------------------------------------------------------------------
> 5. Fully Developed ByteCode Archives
>
> XPL programs are developed into packages. Packages are the unit of
> deployment and as such I need a way to (a) archive several bytecode
> files together, (b) index the globals in them, and (c) compress the
> whole thing with bzip2.  Although LLVM has some support for this today
> with the llvm-ar program, I don't believe it supports (b) and (c).

This makes a lot of sense.  The LLVM bytecode reader supports loading a
bytecode file from a memory buffer, so I think it would be pretty easy to
implement this.  Note that llvm-ar is currently a work-in-progress, but it
might make sense to implement support for this directly in it.  Afterall,
we aren't constrained by what the format of the ".o" files in the .a file
look like (as long as gccld and llvm-nm support the format).

> Note that bytecode files compress to about 50% with bzip2 which means
> faster transmission times to their destinations (oh, did I mention that
> XPL supports distributed programming? :)  The resulting archive program
> would be more similar to jar/tar than to ar.

Also note that we are always interested in finding ways to shink the
bytecode files.  Right now they are basically comperable to native
executable sizes, but smaller is always better!

> ------------------------------------------------------------------
> 6. Incremental Code Generation
>
> The conventional wisdom for compilation is to emit object code (or in
> our case the byte code) from a compiler incrementally on a per-function
> basis. This is necessary so that one doesn't have to keep the memory for
> every function around for the entire compilation. This allows much

That makes sense.

> I'm not sure if LLVM supports this now, but I'd like LLVM to be able to
> write byte code for an llvm::Function object and then "drop" the
> function's body and carry on. It isn't obvious from llvm::Function's
> interface if this is supported or not.

This has not yet been implemented, but a very similar thing has:
incremental bytecode loading.  The basic idea is that you can load a
bytecode file without all of the function bodies.   As you need the
contents of a function body, it is streamed in from the bytecode file.
Misha added this for the JIT.

Doing the reverse seems very doable, but noone has tried it.  If you're
interested, take a look at the llvm::ModuleProvider interface and the
implementations of it to get a feeling for how the incremental loader
works.

> The only drawback to this is the effect on optimization. I would suggest
> that after bytecode generation, a function's "body" be replaced with
> some kind of summary (annotation?) of interest to optimization passes.

This is very similar to the functionality required for the incremental
loader, so when it gets developed for the loader, the writer could use
similar kinds of interfaces.

> Taking the above suggestion to its logical conclusion, it might be
> useful to create a general mechanism for passes to leave "tidbits" of
> information around for other passes. The Annotation mechanism probably
> could be used for this purpose but something a little more formal would
> probably be better. It's highly likely there's something like this in
> place already that I'm not aware of.

LLVM already has an llvm::Annotation class that does exactly this :)

> ------------------------------------------------------------------
> 7. Idioms Package
>
> As I learned from Stacker (the hard way), there are certain idioms that
> occur in using LLVM over and over again. These idioms need to be either
> (a) documented or (b) implemented in a library.  I prefer (b) because it
> implies (a) ;>  Such idioms as if-then-else, for (pre; cond; post),
> while(cond), etc. should be just coded into a framework so that compiler
> writers have a slightly higher level interface to work with.
>
> Although I like this idea, its low on my list because I regard LLVM
> _already_ incredibly easy to use as a compiler writer's tool. But, hey,
> why stop at "incredibly easy" when there's "amazingly trivial" waiting
> in the wings?

Developing a new "front-end helper" library could be interesting!  The
only challange would be to make it general purpose enough that it would
actually be useful for multiple languages.

> ------------------------------------------------------------------
> 8. Create a ConstantString class
>
> Constant strings are very common occurrences in XPL and probably are in
> other source languages as well. The current implementation of
> ConstantArray::get(std::string&) is a bit weak. It creates a
> ConstantSInt for every character. What if the strings are long and the
> program creates many of them? It seems a little heavy weight to me.

This is something that might make sense to deal with in the future, but it
has a lot of implications in the compiler and optimizer.  Look at GCC for
example, there are many optimizations that works on constant strings but
not on arrays of characters or any other type.  At this stage in the game,
effort is probably best spent elsewhere.  :)

On the other hand, adding a hack to the bytecode format to efficiently
encode strings is something that I have been considering: there the effect
of the change is more contained.

> ------------------------------------------------------------------
> 9. More Native Platforms Supported
>
> To get the platform coverage that I need, I'm making the XPL compiler
> use the C back end. Its slower to compile that way but I'll only need it
> for those programs that want to go fully native. The back end support in
> LLVM is a bit weak right now in terms of both optimizations available
> and platforms supported. This isn't a big priority for me as there is a
> viable alternative to native platform support.

Yup, that makes sense.  Supporting the CBE will always be a good idea, but
adding new native platforms and improving the ones we do will be
increasingly important over time.  :)

> ------------------------------------------------------------------
>
> I'll do another one of these postings as I get nearer to the end of the
> XPL Compiler implementation. There should be lots more ideas by then.
> Don't hold your breath :)

Cool, keep us informed!  :)

-Chris

-- 
http://llvm.cs.uiuc.edu/
http://www.nondot.org/~sabre/Projects/