[LLVMdev] Extending LLVM

Sat Mar 27 13:00:02 PST 2004

On Fri, 2004-03-26 at 14:57, Chris Lattner wrote:
> On Fri, 26 Mar 2004, Reid Spencer wrote:
> 
> > Would people other than myself find it useful to have a standardized
> > extension framework for LLVM? I'm thinking of something that would allow
> > new LLVM instructions, fundamental types, structured types, etc. This
> > would require significant work to allow the various pieces of LLVM
> > (assembler, disassembler, runtime, JIT, optimization, analysis, code
> > gen, etc.) to be extended arbitrarily.
> 
> It sounds interesting!  However, I would make sure that you are _very_
> clear about what the goals of this are.  Many of the things you mentioned
> have very different uses.  In particular, what can't LLVM do now that
> adding this would enable?

Like I said in the original post, I wasn't going to float this idea
until it was a little more baked. I agree that the goals would have to
be very clearly defined. Currently, this is just an idea that might help
XPS provide extensibility a little more easily (pass the buck!). There
probably isn't anything that the current LLVM plus a good runtime
library couldn't provide. The basic idea is to allow for optimization of
higher level concepts/constructs within the LLVM framework.  

One good example I can think of is transactions. It might be possible to
add something like a "start transaction" and "end transaction"
instruction into LLVM (as an extension) to demarcate "all or none"
processing boundaries. From an analysis and possibly optimization
perspective this could be interesting. There are other higher level
concepts such as sessions and clusters that one might want to express at
the LLVM level for optimization.

That said, this idea is very early in its genesis.  As you noted, much
more thought would need to be put into it.

> 
> > Each extension would be a self-contained piece of code that could be
> > dynamically loaded by LLVM core and used at the right times during LLVM
> > execution.
> 
> Again, the examples you gave are all over the board.  

Not sure what you meant by "all over the board". What board? :)
I assume you meant that the examples were quite varied. However, you
might have meant that the extension framework I'm suggesting would
affect many parts of LLVM (which it would).  Can you be a little more
specific?

> This would require
> being able to plug things into the code generator, various
> transformations, etc.  While possible, I'm curious what the real goals
> are.

The goal would be to allow source language specific concepts to benefit
from the LLVM optimization architecture.  However, as  mentioned above,
probably a well designed runtime library would have the same effect
without requiring LLVM to support extension.

> 
> > Such things as threading support,
> 
> It is already planned for LLVM to get some sort of first-class support for
> threading and mutual exclusion, but again, what real purpose does adding
> it to the language give you?  

Isn't this self-contradictory? You're asking what advantage adding
first-class support for threading and mutual exclusion would give me
while also stating that its already planned. Obviously there is *some*
purpose to it! :)

Perhaps threading support isn't the best example. I think transactions,
sessions, and clustering are better examples.  I suggested threading
because it *could* be implemented as an extension. That is, the core
LLVM semantics are for single-threaded applications.  The threading
extension would provide platform independent support for basic thread
management and synchronization primitives.

> In the case of thread support, we are going
> for two primary goals:
> 
> 1. Allow multithreading to be expressed at the LLVM level WITHOUT
>    target-specific dependences.  This would allow the same threaded LLVM
>    bytecode file to run on both linux with pthreads and windows with their
>    threads, for example.
> 2. Expose the semantics of the operations to the optimizer, to allow
>    redundant lock removal, etc.

Right.

> 
> Note that #2 isn't even strictly necessary either.  It's perfectly
> reasonable to look for "pthread_mutex_lock" and do optimizations. 

Perhaps, but what if its "WaitForMultipleObjects" as in Win32. Point 2
becomes important in the face of point 1 (platform independence). You
don't want to have to write multi-threading optimizations for every
platform on which LLVM runs. If the concept (mutex lock) is supported by
an extension and that extension also contains all the optimizations and
code generation for that concept, I think this is a win architecturally.
If you don't want threading support, don't load the extension. If you do
want threading support, it completely extends LLVM to support it in all
the things LLVM does (byte code generation, disassembly, code
generation, analysis, optimization, etc.).

>  Our
> hope is that we the eventual threading support that LLVM has will be a
> lower-level subset of what pthreads provides (providing enough to
> implement pthreads on top of it) making it easier to analyze and xform
> than pthreads calls directly.

Right, I agree.  My point was that all of this support for threading
*could* be provided as an extension to LLVM (not that it necessarily
*should* be).

> 
> > advanced math functions, scientific computations and types
> > (rational,complex,etc.), higher order data structures, etc. could all be
> > done as extensions instead of adding them to the core.
> 
> I'm not sure what adding support for these to the LLVM language would
> allow us to do.  Can you give some examples?

Suppose you had a math package that provided (a) a wealth of
mathematical types and functions and (b) made it platform independent.
There are numerous optimizations that could be applied to reduce the
computation necessary. Floating point computation can be very expensive
and is a good target for optimization.  Simple things like sin(x)/cos(x)
could be transformed into tan(x). There are numerous other computations
that could be simplified. 

Additionally, consider rational numbers. This type requires two integer
data values: the numerator and the denominator. I haven't thought about
it much but I'll bet there are optimizations on rational numbers that
could benefit from knowing that two integer data values in adjacent
memory locations are actually a single "rational". There are definitely
fundamental operations on rationals (e.g. GCD, LCD) that can take
advantage of knowing that a pair of integers is actually a rational
number. This exact same argument can be made for complex numbers and
infinite precision integer and floating point representations. 

In the realm of data structures, you are already working on this in
terms of providing a pool allocator for elements of the same data
structure. There's lots more that could be done.

In XPS I have this notion of mutable collections. That is, the optimal
underlying representation (array, list, b-tree, hash-table)  of a
generic collection of information depends on a variety of factors:
frequency of insert vs. remove, need for fast associative retrieval, and
size of the collection (amongst others).  For example, for collections
of size less than, say, 3, an array works just fine. Linear scan to find
the element is even faster than a hash table or linked list.  Two things
could be done in this area: (1) optimization of the choice of collection
representation based on static analysis of the program, and (2) mutation
of the collection to different representations at runtime based on usage
patterns (i.e. another form of runtime, profile-based optimization). 

Making LLVM handle all this "out of the box" could be onerous. Not all
applications need to use mutable collection data structures, and fancy
mathematics types. Much less, transactions, sessions, clustering, etc. 
So, providing a generic extension mechanism would provide the source
language writer a way to plug in new concepts/types/optimizations into
LLVM. It would be *REALLY* nice if there was a way to extend the
bytecode for a specific source language. But, I digress.

Before you say it, I agree that LLVM might already be extensible enough
and that all of the above could be provided with various optimizations
and runtime libraries provided by the source level language.  It just
seems conceptually simpler to me if there was some well-defined (and
simpler) interface to the extensions provided by LLVM.

> > This would keep the core small(ish) and allow researchers and
> > implementors to add things to LLVM in an isolated fashion.
> 
> This is definitely a great goal, but can you be specific about what kinds
> of things this will allow LLVM to do?  In particular, none of these have
> property #1 above (they can already be implemented portably with existing
> LLVM constructs) so they don't increase the "expressiveness" of LLVM.

I guess I don't look at this as "what kinds of things this will allow
LLVM to do" or how they'd "increase the expressiveness of LLVM".  To me,
its just a cleaner/simpler interface to LLVM for source language
writers. 

I definitely need to think about this more as I implement XPL. So far, I
haven't run into anything that I definitely thought "gee, it would sure
be nice if LLVM would allow extension of X".  However, so far my
compiler is dealing with the low level stuff of arithmetic computation
and basic control flow.  When I get to the higher level stuff
(transactions, sessions, authentication, database access, etc.) I'll
need to give this some more thought.

Reid
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20040327/3d7f4932/attachment.sig>