[LLVMdev] How the LLVM tools work together

Sat Oct 30 00:25:33 PDT 2010

On Thu, Oct 28, 2010 at 4:41 PM, Stephen Norman <stenorman2001 at me.com> wrote:
> Hi,
>
> I've been reading through some of the documentation and I'm a little confused.
>
> What I'm wondering is if someone could explain how the different tools in LLVM (llvmc, clang, llvm-gcc, llvm-ar, etc.) work together to go from the C code I create through to a running executable (after linking).
>
> Apologies if this isn't the right list. I'm not a compiler developer so I'm rather a novice with how LLVM works.
>
> Cheers,
>
> Stephen

Most of the tools are really just compiler hacker tools that we use
for development, test, and demonstration. LLVM is designed to be used
as a set of libraries instead of a set of tools. However, there's
nothing stopping you (and it can be quite informative) to do each step
individually.

clang contains a driver, much like gcc, that takes the source files
and options you provide and produces the desired output. This can be
anything from just preprocessing all the way down to a final
executable.

So the command:
% clang -O3 source.c -o prog.exe

Can be broken down into:

* Pre-process
% clang -E source.c -o source.ii
* Compile to the llvm intermediate representation
    - This file is a human readable representation of the c input code
for the specified target.
% clang -S -emit-llvm source.ii -o source.ll
* Optimize
    - This runs a set of optimizations on source.ll and outputs the
optimized version in a binary encoded version of the llvm-ir. Use the
-S option to get readable output.
% opt -O3 source.ll -o source-opt.bc
* Generate machine code
    - This lowers the llvm-ir to the target instruction set and
optimizes it along the way.
% llc -O3 source-opt.bc -o source.s
* Assemble
% as source.s -o source.o
* Link
% ld source.o -o prog.exe

clang doesn't directly run all these commands. It uses the libraries
internally to do everything up to assembly output, and on some
platforms it even does the assembling internally.

* llvm-{as,dis} are just used to convert to and from the bitcode and
human readable llvm-ir.
* llvm-ar is for creating standard archives containing bitcode.
* llvmc ... I'm still confused about the exact reason for this one.
* llvm-diff produces intelligent diffs between two llvm-ir files
ignoring names. Makes it much easier to tell what semantics changed
when values are renamed.
* llvm-ld is really just a driver for the system linker. It can also
produce scripts that run the bitcode via lli.
* llvm-link links llvm-ir files together.
* llvm-mc is the machine code playground. It can be used as an
assembler, dissembler, and other things.
* llvm-nm is classic unix nm for llvm-ir. It dumps the symbol table.

And I don't know what the rest are for exactly.

You don't need to know about any of these to use clang or llvm-gcc,
but they can be useful when playing with llvm.

- Michael Spencer