[LLVMdev] Scheme + LLVM JIT

Thu May 5 09:11:26 PDT 2005

On Thu, May 05, 2005 at 03:46:58AM -0400, Alexander Friedman wrote:
> On May  5, Misha Brukman wrote:
> > To the best of my knowledge, this has not been done and no one has
> > announced their intent to work on it, so if you are interested,
> > you'd be more than welcome to do so.
> 
> My C++ knowledge is completely non-existant, but so far I've had a
> surprisingly easy time reading the source. This change seems somewhat
> involved - I will have to implement different calling conventions -
> ie, passing a return-address to the callee, etc. Who is the right
> person to talk to abot this?

The notes you refer to belong to Chris Lattner, but you should just post
your questions on llvmdev and someone will answer them.  The benefits
are that you may get your response faster than emailing someone
directly, you may get multiple perspectives, and the discussion is
archived for future LLVMers who are looking for some similar advice.

> Ok, this makes sense. However, am I correct in assuming that the
> interaprocedural optimizations performed in gccas will make it
> problematic to call 'JIT::recompileAndRelinkFunction()' . For example,
> suppose I run run some module that looks like
> 
> module a
> 
> int foo () {
>  ...
>  bar()
>  ...
> }
> 
> int bar () {
>  ...
> }
> 
> through all of those optimizations. Will the result nessisarily have a
> bar() function? 

You are correct, it may get inlined.

> If inlining is enabled, replacing bar might have no effect if it's
> inlined in foo. 

True.

> If inlining is not enabled, are there other gotcha's like this?

There are optimizations that will remove dead code (unreachable basic
blocks, functions that are never called directly or possibly aliased,
global variables that are not referenced or possibly aliased),
optimizations that eliminate unused arguments from functions
interprocedurally, structs may be broken up into constituent elements
(scalar replacement of arguments), arguments may become passed by value
instead of by reference if it's legal to do so, etc.

These all affect the structure of the code and if you care to preserve
some elements of it, you might do well to select your own set of
optimization passes to do what you want and none of the things you
don't.

Alternatively, I *think* you can use the debugging intrinsics to "mark"
what you want to be preserved and it will pessimize the optimizations.
I am not well-versed with the debugging intrinsics, so that's a guess. 
See http://llvm.cs.uiuc.edu/docs/SourceLevelDebugging.html for info.

However, let's step back for a second.  I am talking about what effect
gccas/gccld will have on code generated by some front-end.  Presumably,
you want to write a single stand-alone JIT that will take scheme -> LLVM
-> native code via JIT.  Hence, gccas/gccld optimization selection
doesn't really apply to you.  You can write your own "tool" that will
use the JIT libraries and the optimizations of your choosing, if you so
desire.

If, however, you are using the gccas/gccld + lli model, then we're
talking about two different "Modules", i.e. the one before optimization
that gccas/gccld hack on, and the one after optimization that lli
actually sees and compiles/executes.

Once the bytecode file is passed to the JIT, *that* is what I would call
the "Module being executed".  So while you are in the JIT environment,
the JIT will not do any inlining.  Any inling will have already been
performed.  So if the JIT has no function "bar", well, it's not that it
"disappeared", it's just that in the lifetime of the JIT, it never
existed in the first place.

> If there are complications like this, how much of a performance gain
> do the interprocedural opts give?

I don't have any numbers for this, because the inter-procedural
optimizations are bunched in with intra-procedural ones, and we always
run them all.  So we don't really have any measurements for the gain of
JUST interprocedural optimizations.

You can try turning off ALL optimizations with 

  llvm-gcc -Wa,-disable-opt -Wl,-disable-opt [...]

The first will disable all gccas optimizations and the second -- gccld.
The problem is that by removing them all it REALLY pessimizes the code,
as all locals will be stack-allocated, for instance.

If inlining is really the biggest problem you're facing, there's a
-disable-inlining flag for gccld.

> Also, compileAndRelink (F) seems to update references in call sites of
> F. Does this mean that every function call incurs an extra 'load' , or
> is there some cleverer solution?

We don't track all the call sites.  Instead, what recompile and relink
does is adds an unconditional branch from the old function (in native
code) to the new one (in native code again), so what this does is add an
extra indirection to all subsequent calls to that function, but not an
extra load.

One cleverer solution would be to actually track all the call sites, but
then if recompile-and-relink is called rarely, it would be an extra
overhead, not a gain, so it would slow down the common case.

Another cleverer solution would be to overwrite the machine code
in-place with the code for the new function, but then the problem is
that we lay out the code for functions sequentially in memory so as to
not waste space, and hence, each recompilation of the function better
fit into the place of the old one, or else we might run into the code
region of the next function.  This means that we then have to break up
the code region for a function into multiple code sections, possibly
quite far apart in memory, and this leads to more issues.

So we've taken the simpler approach above which works.

> Finally, if I jit-compile several modules, can they reference each
> other's functions? If this is answered somewhere in the docs, I
> appologize.

At present, I am not quite sure that the JIT will accept two different
Modules, most tools (except the linkers) assume a single Module that is
given to them.  I have not used two Modules with the JIT and I haven't
seen anyone else do that, so it maybe a limitation or it just may need
some extention to support it, I'm not sure.

> Why use linear scan on X86? 

I should mention that the SparcV9 is different in structure from the
other backends so the only backend that can use the graph-coloring
register allocator is the SparcV9 backend.  The linear-scan was written
in a different format, and it can be used on any backend except the V9.

Also, why linear scan vs. XYZ?  Because someone wrote it and contributed
it to LLVM (that someone happens to be Alkis, and he's a member of the
LLVM group). :)

> Does it have some benefits over graph-coloring? 

Algorithmically, I think it's O(n) for linear scan vs. O(n^3) for graph
coloring.  Specifically for LLVM, Alkis wrote a linear-scan allocator
and so we have it.  Someone at a different school wrote a graph-coloring
allocator for LLVM, but they haven't contributed it back to us, so we
don't have it.

> FWIW, Lal George has a paper on using graph coloring on the register
> poor X86 by implicitly taking advantage of the Intel's register
> mapping to emulate 32 registers. The result is between 10 and 100%
> improvement on the benchmarks he ran (but the allocater is 40%
> slower).

It's not that we're against graph-coloring per se, it's just that no one
has stepped up to the plate to do it and share the code with us.

I should mention that we accept patches. :)

> > llvm/examples/HowToUseJIT pretty much has the minimal support one needs
> > for a JIT, but if you can make it even smaller, I'd be interested.
> 
> Sorry, what i actually meant was: what are the minimum libraries that
> I have to compile in order to be able to build the HowToUseJIT (and
> all the passes in gccas/ld).

The Makefiles for gccas/gccld/lli list the libraries they use in the
USEDLIBS variable.  Additionally, lli uses the special meta-library
"JIT" which expands to include the generic JIT libraries and the target
backend(s) that you have selected.  See llvm/Makefile.rules, search for
JIT_LIBS and read from there.

You want to take the union of these sets, and that would do it.

> Yes, I just tried with cvs and It still compiles all back-ends. I'll
> try it again to make sure, and then report the bug.

Instead of opening a new bug, please reopen this one:
http://llvm.cs.uiuc.edu/PR518

> It's not the linking/relocating that's the problem. The problem is
> that each binary winds up being rather large. However, since these
> tools don't need to be distributed or compiled for my purposes, I
> guess i'm not really worried about it.

Compiling optimized binaries rather than debug build would save quite a
bit of space in the binaries.  Other than that, I'm not really sure,
except maybe to compile LLVM with LLVM and then use its aggressive
optimizations and dead-code elimination transformations? :)

-- 
Misha Brukman :: http://misha.brukman.net :: http://llvm.cs.uiuc.edu