[LLVMdev] Extending LLVM

Sun Mar 28 00:25:01 PST 2004

On Sat, 27 Mar 2004, Reid Spencer wrote:
> Like I said in the original post, I wasn't going to float this idea
> until it was a little more baked. I agree that the goals would have to
> be very clearly defined.

Ok.

> Currently, this is just an idea that might help XPS provide
> extensibility a little more easily (pass the buck!). There probably
> isn't anything that the current LLVM plus a good runtime library
> couldn't provide.

Well adding stuff to LLVM *only* to make source-language front-ends easier
to implement is probably not something that we want to do....

> The basic idea is to allow for optimization of higher level
> concepts/constructs within the LLVM framework.

... however, if there is value added by doing this at LLVM, then it can
make a lot of sense to do so.  :)

> One good example I can think of is transactions. It might be possible to
> add something like a "start transaction" and "end transaction"
> instruction into LLVM (as an extension) to demarcate "all or none"
> processing boundaries. From an analysis and possibly optimization
> perspective this could be interesting. There are other higher level
> concepts such as sessions and clusters that one might want to express at
> the LLVM level for optimization.

Interesting idea.  I'm not sure exactly what the implications of this
would be, but if it was worked out it might be interesting.

> > > Each extension would be a self-contained piece of code that could be
> > > dynamically loaded by LLVM core and used at the right times during LLVM
> > > execution.
> >
> > Again, the examples you gave are all over the board.
>
> Not sure what you meant by "all over the board". What board? :)
> I assume you meant that the examples were quite varied. However, you
> might have meant that the extension framework I'm suggesting would
> affect many parts of LLVM (which it would).  Can you be a little more
> specific?

Sorry, I did mean quite varied.

> > > Such things as threading support,
> >
> > It is already planned for LLVM to get some sort of first-class support for
> > threading and mutual exclusion, but again, what real purpose does adding
> > it to the language give you?
>
> Isn't this self-contradictory? You're asking what advantage adding
> first-class support for threading and mutual exclusion would give me
> while also stating that its already planned. Obviously there is *some*
> purpose to it! :)

Sorry, I meant it as a rhetorical question :)

> Perhaps threading support isn't the best example. I think transactions,
> sessions, and clustering are better examples.  I suggested threading
> because it *could* be implemented as an extension. That is, the core
> LLVM semantics are for single-threaded applications.  The threading
> extension would provide platform independent support for basic thread
> management and synchronization primitives.

Okay, those are all MUCH higher-level concepts. :)  If your extension
mechanism could allow us to do interesting things with such varied
concepts, that would be a huge win.  :)

> > In the case of thread support, we are going
> > for two primary goals:
> >
> > 1. Allow multithreading to be expressed at the LLVM level WITHOUT
> >    target-specific dependences.  This would allow the same threaded LLVM
> >    bytecode file to run on both linux with pthreads and windows with their
> >    threads, for example.
> > 2. Expose the semantics of the operations to the optimizer, to allow
> >    redundant lock removal, etc.
>
> Right.
>
> >
> > Note that #2 isn't even strictly necessary either.  It's perfectly
> > reasonable to look for "pthread_mutex_lock" and do optimizations.
>
> Perhaps, but what if its "WaitForMultipleObjects" as in Win32.

Yup, there are many reasons why this approach isn't ideal. :)

> Point 2 becomes important in the face of point 1 (platform
> independence). You don't want to have to write multi-threading
> optimizations for every platform on which LLVM runs.

Exactly.

> If the concept (mutex lock) is supported by an extension and that
> extension also contains all the optimizations and code generation for
> that concept, I think this is a win architecturally. If you don't want
> threading support, don't load the extension. If you do want threading
> support, it completely extends LLVM to support it in all the things LLVM
> does (byte code generation, disassembly, code generation, analysis,
> optimization, etc.).

This is something that concerns me.  Something that I've tried very hard
to do with LLVM is to seperate orthogonal ideas as much as possible.  This
is what makes it relatively easy to add a source-langauge or target to
LLVM.  If extensions could require code-gen changes, that would mean that
they would require changes to ALL of the code generators implemented,
which poses realistic scalability problems.

The way we currently deal with this is through the 'IntrinsicLowering'
interface.  Basically a code generator supports some set of intrinsics,
for example %llvm.memcpy but not %llvm.memmove.  The IntrinsicLowering
class knows how to lower an unhandled intrinsic to non-intrinsic LLVM
code (in this case turning llvm.memmove -> memmove).  This means that it's
easy to add intrinsics to LLVM without implementing them in every code
generator (just add it to the lowering class).

The only time that this does not work is in cases where code-generator
specific support is required, such as llvm.returnaddress or threading
primitive implementations.  These cannot be implemented in terms of other
LLVM instructions.  I think the number of these required is extremely
limited, and adding them to all code generators makes sense, but they are
tightly-coupled to the code generator so I don't think it makes sense to
implement them as "extensions".

> > > advanced math functions, scientific computations and types
> > > (rational,complex,etc.), higher order data structures, etc. could all be
> > > done as extensions instead of adding them to the core.
> >
> > I'm not sure what adding support for these to the LLVM language would
> > allow us to do.  Can you give some examples?
>
> Suppose you had a math package that provided (a) a wealth of
> mathematical types and functions and (b) made it platform independent.
> There are numerous optimizations that could be applied to reduce the
> computation necessary. Floating point computation can be very expensive
> and is a good target for optimization.  Simple things like sin(x)/cos(x)
> could be transformed into tan(x). There are numerous other computations
> that could be simplified.

Sure.  Something I have long-considered adding is some sort of
programmable "peephole" optimization pass for LLVM call instructions.
Imagine being able to do simple pattern matches to do things like:

1. sin(x)/cos(x) -> tan(x)    (iff using the equiv of -ffast-math)
2. exit(n) in "main" -> ret n
3. printf("abc\n") -> puts("abc");
4. etc, there are countless things that you could do with libc/libm

I think that you could get a lot of mileage out of a simple system
like this, especially if you let the front-end pass down the patterns.
This would also allow the LLVM optimizer to do C++ copy-constructor
elision late in the optimization process if done right.

> Additionally, consider rational numbers. This type requires two integer
> data values: the numerator and the denominator. I haven't thought about
> it much but I'll bet there are optimizations on rational numbers that
> could benefit from knowing that two integer data values in adjacent
> memory locations are actually a single "rational". There are definitely
> fundamental operations on rationals (e.g. GCD, LCD) that can take
> advantage of knowing that a pair of integers is actually a rational
> number. This exact same argument can be made for complex numbers and

I'm not convinced about these examples, but you can get a win by
expressing information that is not already visible to the optimizer.
Consider simple refcounting optimizations for something like this:

addref(x); addref(x) dropref(x);

These three calls will probably expand into something like this:

++x.refcount
++x.refcount
if (--x.refcount == 0) <something>

The LLVM optimizer will already turn this into:

++x.refcount
if (x.refcount == 0) <something>

... but it cannot delete the if condition.  The reason is that it doesn't
know that the refcount won't wrap around.  Expressing things like *this*
to the optimizer could be useful.

> infinite precision integer and floating point representations.

These are also useful, and could make use of a simple xform system like
the above to implement optzns like (X + 0) == X on infinite precision
integer numbers, for example.

> > This is definitely a great goal, but can you be specific about what kinds
> > of things this will allow LLVM to do?  In particular, none of these have
> > property #1 above (they can already be implemented portably with existing
> > LLVM constructs) so they don't increase the "expressiveness" of LLVM.
>
> I guess I don't look at this as "what kinds of things this will allow
> LLVM to do" or how they'd "increase the expressiveness of LLVM".  To me,
> its just a cleaner/simpler interface to LLVM for source language
> writers.

Ok, as I said before, this is NOT enough to justify adding it to LLVM
(perhaps it could be added to a HLVM or some such :).  However, I do think
there is room here to do interesting things!

> I definitely need to think about this more as I implement XPL. So far, I
> haven't run into anything that I definitely thought "gee, it would sure
> be nice if LLVM would allow extension of X".  However, so far my
> compiler is dealing with the low level stuff of arithmetic computation
> and basic control flow.  When I get to the higher level stuff
> (transactions, sessions, authentication, database access, etc.) I'll
> need to give this some more thought.

Sounds good.  As the ideas get more concrete, it will be interesting to
look at this again in the future. :)

-Chris

-- 
http://llvm.cs.uiuc.edu/
http://www.nondot.org/~sabre/Projects/