[LLVMdev] Compiler Driver Decisions

Wed Aug 4 08:42:15 PDT 2004

On Wed, Aug 04, 2004 at 10:24:20AM -0500, John Criswell wrote:
> o Object Files
> 
> I've noticed that there's a general agreement that we should not
> encapsulate LLVM bytecode files inside of another object format (such
> as ELF).  However, I'd like to pose a few potential benefits that
> encapsulation in ELF might provide:
> 
> 1) It may provide a way for standard UNIX tools to handle bytecode
> files without modification.  For example, programs like ar, nm, and
> file all take advantage of the ELF format.  If we generated LLVM ELF
> files, we wouldn't need to write our own nm and ar implementations and
> port them to various platforms.

System `nm' has no meaning if it's run on an LLVM bytecode file.  Right
now, we already have an llvm-nm, and that works by finding the *LLVM*
symbols, globals and functions, and prints out whether they are defined
or not.

If we just plop the binary LLVM bytecode in an ELF section, it will go
happily ignored by the system nm, and no useful output will be produced.

So, in essence, we *do* need our own nm, ar, etc.  Otherwise, what
you're suggesting is that any bytecode file is in its own ELF section
with a *FULL* native translation separately from it, which is overkill,
IMHO.

> 2) It could mark the bytecode file with other bits of useful
> information, such as the OS and hardware on which the file was
> generated.

We already have that: in addition to pointer size, Reid as added the
capability to encode the target triple of the system directly into the
bytecode file.

> 3) It may provide a convenient means of adding dynamic linking with 
> other bytecode files.

Reid has added this as well.

> 4) It may provide a convenient place to cache native translations for
> use with the JIT.

This an interesting concept, but it seems to be the only one of four
left, and I'm not sure it's worth the trouble of writing and re-writing
and re-patching native code to support this... 

> Here are the disadvantages I see:
> 
> 1) Increased disk usage.  For example, symbol table information would 
> duplicate the information already in the bytecode file.

True that.

> 2) Automatic execution.  Ideally, if I have a bytecode executable, I
> want to run it directly.  On UNIX, that is done with #!<interpreter>.
> I believe ELF provides similar functionality (where exec()ing the file
> can load a program or library to do JIT compilation), but if it
> doesn't, then we lose this feature.

1. Use LLEE :)
2. Tell the OS (in this case Linux) how to run bytecode files directly:
   http://llvm.cs.uiuc.edu/docs/GettingStarted.html#optionalconfig

> o Compiler Driver Name
> 
> I'd vote for either llvmcc (llvm compiler collection) or llvmcd (llvm
> compiler driver).  To be more convenient, we could call it llc (LLvm
> Compiler) or llcd (LLvm Compiler Driver).  Calling it llc would
> require renaming llc to something else, which might be appropriate
> since I view llc as a "code generator" and not as a "compiler"
> (although both terms are technically accurate).

I've voted for llvmcc before, but it was turned down.

LLC is a nice idea, but yeah, it's already taken, and sounds like LCC
which is another compiler...

llvmcd sounds like "chdir compiled to llvm" or "LLVM-specific chdir"
given the other tools: llvm-as, llvm-gcc, etc.

> o Optimization options
> 
> I agree with the idea of using -O<number> for increasing levels of
> optimization, with -O0 meaning no optimization.  It's a pretty
> intuitive scheme, and many Makefiles that use GCC use the -O option.

I agree with -O0 instead of -On.

-- 
Misha Brukman :: http://misha.brukman.net :: http://llvm.cs.uiuc.edu