[LLVMdev] 9 Ideas To Better Support Source Language Developers

Tue Jan 6 05:05:01 PST 2004

A while back I promised to provide some feedback on useful extensions to
LLVM to better support source language writers (i.e. those _using_ LLVM,
not developing it). Below is a list of the ideas I've come up with so
far.

As I get more of XPL's compiler done, I'll start diving into each of the
these areas.  I'm posting early in the hopes that discussion will bear
some fruit.  In discussing these things, I'm mostly interested in
learning whether any of the following ideas should or should not be part
of LLVM as opposed to part of XPS.

DISCLAIMER:
If any of the following items are already implemented, I missed it! So,
please enlighten me!

NOTE: 
If you respond to this, please respond to each item in a separate
message to the list.  That way we can keep track of different topics on
different discussion threads. I doubt you'll want to do that, however:
these are all great ideas and should just be adopted without further
discussion! :)))) <kidding!>

The following items are ranked roughly in order of importance to _me_.
Feel free to rank them for your needs -- it would be interesting to see
what's important to others.

------------------------------------------------------------------
1. Definition Import

Source languages are likely to create lots of named type and value
definitions for the memory objects the language manipulates. Redefining
these in every module produces byte code bloat. It would be very useful
for LLVM to natively support some kind of import capability that would
properly declare global types and global values into the module being
considered. Even better would be a way to have this capability supported
as a first-class citizen with some kind of "Import" class and/or
instruction: simply point an Import class to an existing bytecode file
and it causes the global declarations from that bytecode file to be
imported into the current Module.

------------------------------------------------------------------
2. Memory Management

My programming system (XPS) has some very powerful and efficient memory
allocation mechanisms built into it. It would be useful to allow users
of LLVM to control how (and more importantly where) memory is allocated
by LLVM. This is another pretty large, sweeping change that will affect
every "new" in LLVM.  The various Class::get() methods would need to be
altered as well as the various createXXX functions. To minimize the
impact, we would subclass every root class in the LLVM inheritance
hierarchy from some "Allocatee" class that implements operators new and
delete. This base class would handle dispatching operator new to a
user-installed version, if provided. Otherwise it just defaults to
::new. This trick means we don't have to change _every_ "new" call, just
the ones that allocate things outside of the llvm namespace.

------------------------------------------------------------------
3. Code Signing Support

One of the requirements for XPL is that the author and/or distributor of
a piece of software be known before execution and that there is a way to
validate the integrity of the bytecodes.  To that end, I'm planning on
providing message digesting and signing on LLVM bytecode files. This is
pretty straight forward to implement. The only question is whether it
really belongs in LLVM or not. Note that code signing is pretty much a
standard part of Java these days. There's one issue with code signing:
it thwart's global optimization because changing the byte code means
changing the signature.  While the software's author can always do this,
a signed bytecode file could not be globally optimized into another
program without breaking the signature.  It would probably be acceptable
to allow LLVM to modify the bytecode in memory at runtime after
de-encryption and verification of the signature.

------------------------------------------------------------------
4. Threading Support

Some low level support for threading is needed. I think there are really
just a very few primitives we need from which higher order things can be
constructed. One is a memory barrier to ensure cache is flushed, etc. so
we can be certain a write to memory has "taken". This goes beyond the
current volatile support and will need to access specific machine
instructions if a native barrier is supported. Another is a thread
forking instruction. I'd like to see TLS supported but that can probably
be constructed from lower level primitives.  A nice-to-have would be
critical section support. This could be done similar to java's
monitorenter and monitorexit instructions.  If I recall correctly, I
believe this capability is being worked on currently.

------------------------------------------------------------------
5. Fully Developed ByteCode Archives

XPL programs are developed into packages. Packages are the unit of
deployment and as such I need a way to (a) archive several bytecode
files together, (b) index the globals in them, and (c) compress the
whole thing with bzip2.  Although LLVM has some support for this today
with the llvm-ar program, I don't believe it supports (b) and (c). Note
that bytecode files compress to about 50% with bzip2 which means faster
transmission times to their destinations (oh, did I mention that XPL
supports distributed programming? :)  The resulting archive program
would be more similar to jar/tar than to ar.

------------------------------------------------------------------
6. Incremental Code Generation

The conventional wisdom for compilation is to emit object code (or in
our case the byte code) from a compiler incrementally on a per-function
basis. This is necessary so that one doesn't have to keep the memory for
every function around for the entire compilation. This allows much
larger programs to be compiled since the memory limit is relative to the
size of a single function rather than the size of the whole program. My
language, XPL, will result in the compilation of huge programs because
it is essentially a language to support program generation. I'm not sure
if LLVM supports this now, but I'd like LLVM to be able to write byte
code for an llvm::Function object and then "drop" the function's body
and carry on. It isn't obvious from llvm::Function's interface if this
is supported or not.  The only drawback to this is the effect on
optimization. I would suggest that after bytecode generation, a
function's "body" be replaced with some kind of summary (annotation?) of
interest to optimization passes. The summary would contain indications
of whether the function calls anything else, modifies global memory,
etc. That way the relevant information for optimization passes can be
retained while all the gory details aren't.

Taking the above suggestion to its logical conclusion, it might be
useful to create a general mechanism for passes to leave "tidbits" of
information around for other passes. The Annotation mechanism probably
could be used for this purpose but something a little more formal would
probably be better. It's highly likely there's something like this in
place already that I'm not aware of.

------------------------------------------------------------------
7. Idioms Package

As I learned from Stacker (the hard way), there are certain idioms that
occur in using LLVM over and over again. These idioms need to be either
(a) documented or (b) implemented in a library.  I prefer (b) because it
implies (a) ;>  Such idioms as if-then-else, for (pre; cond; post),
while(cond), etc. should be just coded into a framework so that compiler
writers have a slightly higher level interface to work with.  

Although I like this idea, its low on my list because I regard LLVM
_already_ incredibly easy to use as a compiler writer's tool. But, hey,
why stop at "incredibly easy" when there's "amazingly trivial" waiting
in the wings?

------------------------------------------------------------------
8. Create a ConstantString class

Constant strings are very common occurrences in XPL and probably are in
other source languages as well. The current implementation of
ConstantArray::get(std::string&) is a bit weak. It creates a
ConstantSInt for every character. What if the strings are long and the
program creates many of them? It seems a little heavy weight to me. I
can't think of a good reason not to support a ConstantString class that
retains the string as a std::string and DTRT with it for code
generation.  I know that every character in the string must be
addressable and that coalescing them into a single object thwarts the
use-def chains, etc. But, couldn't ConstantString just "fake it" some
how so we don't have so many objects created for a string?  One idea is
to just punt. If you use ConstantString then its treated like a single
atomic memory object. Only the address of the first location can be
taken. If that doesn't fit the bill, you can always go back to a
ConstantArray of ConstantSInt. 

------------------------------------------------------------------
9. More Native Platforms Supported

To get the platform coverage that I need, I'm making the XPL compiler
use the C back end. Its slower to compile that way but I'll only need it
for those programs that want to go fully native. The back end support in
LLVM is a bit weak right now in terms of both optimizations available
and platforms supported. This isn't a big priority for me as there is a
viable alternative to native platform support. 

------------------------------------------------------------------

I'll do another one of these postings as I get nearer to the end of the
XPL Compiler implementation. There should be lots more ideas by then.
Don't hold your breath :)

Best Regards,

Reid.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20040106/86ab1148/attachment.sig>