[LLVMdev] 9 Ideas To Better Support Source Language Developers

Reid Spencer reid at x10sys.com
Wed Jan 7 13:35:01 PST 2004


On Wed, 2004-01-07 at 11:12, Chris Lattner wrote:
> > ------------------------------------------------------------------
> > 1. Definition Import
> > Source languages are likely to create lots of named type and value
> > definitions for the memory objects the language manipulates. Redefining
> > these in every module produces byte code bloat. It would be very useful
> > for LLVM to natively support some kind of import capability that would
> > properly declare global types and global values into the module being
> > considered.
> 
> Unfortunately, this would break the ability to take a random LLVM bytecode
> file and use it in a self-contained way.  In general, the type names and
> external declarations are actually stored very compactly, and the
> optimizers remove unused ones.  Is this really a problem for you in
> practice?

I'm trying to get to a "once-and-done" solution on compilation. That is,
a given module is compiled exactly once (per version). There's no such
thing as "include" in XPL, only "import". The difference is that
"import" loads the results of previous compilations (i.e. a bytcode
file).  I included it in my list because I thought it would be something
quite handy for other source languages (Java would need it, for
example). The functionality is something like Java's class loader except
its a module loader for LLVM and it doesn't load the function bodies.

> 
> > Even better would be a way to have this capability supported
> > as a first-class citizen with some kind of "Import" class and/or
> > instruction: simply point an Import class to an existing bytecode file
> > and it causes the global declarations from that bytecode file to be
> > imported into the current Module.
> 
> We already have this: the linker.  :)   Just put whatever you want into an
> LLVM bytecode file, then use the LinkModules method (from
> llvm/Transforms/Utils/Linker.h) to "import" it.  Alternatively, in your
> front-end, you could just start with this module instead of an empty one
> when you compile a file...

Okay, I'll take a look at this and see if it fits the bill.

> 
> > ------------------------------------------------------------------
> > 2. Memory Management
> >
> > My programming system (XPS) has some very powerful and efficient memory
> > allocation mechanisms built into it. It would be useful to allow users
> > of LLVM to control how (and more importantly where) memory is allocated
> > by LLVM.
> 
> What exactly would this be used for?  Custom allocators for performance?
> Or something more important?  In general, custom allocators for
> performance are actually a bad idea...

My memory system can do seamless persistent memory as well (i.e. its
almost a full scale OO Database).  One of my ideas for the "import"
functionality was to simply save the LLVM objects for each module
persistently.  Import then takes no longer than an mmap(2) call to load
the LLVM data structures associated with the module into memory. I can't
think of a faster way to do it.

The reason this is so important to me is that I expect to be doing lots
of on the fly compilation. XPL is highly dynamic.  What I'm trying to
avoid is the constant recompilation of included things as with C/C++.
The time taken to recompile headers is, in my opinion, just wasted time.
That's why pre-compiled header support exists in so many compilers.

I have also tuned my allocators so that they can do multiple millions of
allocations per second on modest hardware. There's a range of allocators
available each using different algorithms.  Each has space/time
tradeoffs. The performance of "malloc(2)" sucks on most platforms and
sucks on all platforms when there's a lot of memory thrash. None of my
allocators suffer these problems.

Curious: why do you think custom allocators for performance are a bad
idea?
> 
> > ------------------------------------------------------------------
> > 3. Code Signing Support
> >
> I don't think that this really belongs in LLVM itself: Better would be to
> wrap LLVM bytecode files in an application (ie, XPL) specific file format
> that includes the digest of the module, the bytecode itself, and whatever
> else you wanted to keep with it.  That way your tool, when commanded to
> load a file, would check the digest, and only if it matches call the LLVM
> bytecode loader.

I'd probably be more inclined to just add an internal global array of
bytes to the LLVM bytecode format.  Supporting a new file format means
that I'd have to re-write all the LLVM tools -- not worth the time. 

So, I'll implement this myself and not extend LLVM with it.

> > ------------------------------------------------------------------
> > 4. Threading Support
> >
> > Some low level support for threading is needed. I think there are really
> > just a very few primitives we need from which higher order things can be
> > constructed. One is a memory barrier to ensure cache is flushed, etc. so
> > we can be certain a write to memory has "taken".
> 
> Just out of curiousity, what do you need a membar for?  The only thing
> that I'm aware of it being useful for (besides implementing threading
> packages) are Read-Copy-Update algorithms.

Um, to implement a threading package :)  I have assumed that, true to
its name, LLVM will only provide the lowest level primitives needed to
implement a threading package, not actually provide a threading package.
I'm sure you don't want to put all the different kinds of
synchronization concepts (mutex, semaphore, barrier, futex, etc.) into
LLVM? All of them need the membar.  For that matter, you'll probably
need an efficient thread barrier as well.

> > ------------------------------------------------------------------
> > 5. Fully Developed ByteCode Archives
> >
> This makes a lot of sense.  The LLVM bytecode reader supports loading a
> bytecode file from a memory buffer, so I think it would be pretty easy to
> implement this.  Note that llvm-ar is currently a work-in-progress, but it
> might make sense to implement support for this directly in it.  Afterall,
> we aren't constrained by what the format of the ".o" files in the .a file
> look like (as long as gccld and llvm-nm support the format).

But if the file gets compressed, it isn't a .a file any more, right? Or,
were you suggesting that only the archive members get compressed and the
file is otherwise an archive?  The problem with that approach is that it
limits the compression somewhat.  Think about an archive with 1000
bytecode files each using common declarations. Compressed individually
those common declarations are repeated in each file. Compressed en
masse, only one copy of the common declarations is stored achieving
close to 1000:1 compression for those declarations.

> Also note that we are always interested in finding ways to shrink the
> bytecode files.  Right now they are basically comparable to native
> executable sizes, but smaller is always better!

Unfortunately, the answer to that is to utilize higher level
instructions. LLVM is comparable to native because it isn't a whole lot
higher level.  Compared with Java, whose byte code knows about things
like classes, LLVM will always be larger because expression of the
higher level concepts in LLVM's relatively low level takes more bytes.

That said, we _should_ strive to minimize 

I haven't really looked into the bytecode format in much detail. Are we
doing things like constant string folding? Could the bytecode format be
natively compressed (i.e. not with bz2 or zip but simply be not
duplicating anything in the output)?


> 
> > ------------------------------------------------------------------
> > 6. Incremental Code Generation
> >
> > The conventional wisdom for compilation is to emit object code (or in
> > our case the byte code) from a compiler incrementally on a per-function
> > basis. This is necessary so that one doesn't have to keep the memory for
> > every function around for the entire compilation. This allows much
> 
> That makes sense.
> 
> > I'm not sure if LLVM supports this now, but I'd like LLVM to be able to
> > write byte code for an llvm::Function object and then "drop" the
> > function's body and carry on. It isn't obvious from llvm::Function's
> > interface if this is supported or not.
> 
> This has not yet been implemented, but a very similar thing has:
> incremental bytecode loading.  The basic idea is that you can load a
> bytecode file without all of the function bodies.

That's what I want for importing! .. item (1) above!

> As you need the
> contents of a function body, it is streamed in from the bytecode file.
> Misha added this for the JIT.

Cool.

> 
> Doing the reverse seems very doable, but noone has tried it.  If you're
> interested, take a look at the llvm::ModuleProvider interface and the
> implementations of it to get a feeling for how the incremental loader
> works.

Okay, I'll see what I can come up with. 


> Developing a new "front-end helper" library could be interesting!  The
> only challenge would be to make it general purpose enough that it would
> actually be useful for multiple languages.

You're right, it would need to be useful for multiple languages. Here's
what I do. I'll revisit this when I get closer to done on the XPL
compiler. I'm building things now that are somewhat framework oriented.
If there are specific patterns that arise and could be useful, I'll
submit them back to the list at that time for review.

> 
> > ------------------------------------------------------------------
> > 8. Create a ConstantString class
>
> This is something that might make sense to deal with in the future, but it
> has a lot of implications in the compiler and optimizer.  

Consider it postponed.Reid.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20040107/ec1f863e/attachment.sig>


More information about the llvm-dev mailing list