[LLVMdev] Some question on LLVM design

Sat Oct 23 23:13:05 PDT 2004

On Fri, 22 Oct 2004, Marc Ordinas i Llopis wrote:

> Hi everybody,

Hi!

> I'm currently looking at LLVM as a possible back-end to a dynamic
> programming system (in the tradition of Smalltalk) we are developing. I

Very cool!

> have read most of the llvmdev archives, and I'm aware that some things
> are 'planned' but not implemented yet. We are willing to contribute the
> code we'll need for our project, but before I can start coding I'll have
> to submit to my boss a very concrete proposal on which changes I'll make
> and how long they're going to take.

Ok.

> So before I can present a concrete proposal, I have some doubts on the
> design of LLVM and on how some particular constructs should be mapped
> onto its bytecode representation. Please understand that these questions
> are not intended to criticize LLVM, but instead to better my
> understanding of it.

Sure.  Before we begin, let me point out that IR design is really a black
art of balancing various forces.  Adding stuff to the IR makes it more
powerful but also more complex to deal with and process (making it more
likely that bugs occur).  However, we need to add things when there are
features or capabilities that cannot be represented in LLVM.  Finally,
LLVM is not immutable: though it's quite stable now, we are always
interested in feedback and ideas. :)

> 1. Opcodes and intrinsics
>
> Which are the differences between opcodes and intrinsics? How is it
> determined, for an operation, to implement it as an opcode or as an
> intrinsic function?

This is a hard thing to really decide, as there frankly isn't much
difference.  Adding intrinsics is far easier than adding instructions, but
intrinsics cannot do some things (for example, they cannot be
terminator instructions).  I tend to prefer to make only very simple and
orthogonal operations be the instructions, relegating the rest to be
intrinsics.  In practice, the biggest difference might be in the bytecode
encoding: instructions are encoded more densely than intrinsics.

> As I understand it, compilation passes can both lower intrinsics into
> opcodes and also replace opcode sequences, so in the end some of them
> are interchangeable. For example, why is there a store opcode and a
> llvm_gcwrite intrinsic?

As pointed out by others, these are completely different operations.  In
particular, the gcwrite intrinsic is used by the front-end to indicate
that a write-barrier may be needed.  Depending on the implementation of
the garbage collector, this may expand to code or just a normal store
instruction.

> Couldn't the front-end just produce stores/volatile stores and then a
> compilation pass transform them into a write-barrier if necessary?

Sortof.  The problem with this is that (without gcwrite) there is no way
to identify the stores that should be turned into write barriers.  In
particular, the heap may have multiple parts to it, some of which are GC'd
and some are not.  For example, the .NET framework has different pointer
types for managed and unmanaged pointers: without the ability to represent
both in LLVM, we could not support it.

> A possible view of intrinsics could be "operations that don't depend on
> the target architecture, but instead on the language runtime".

There are a least these catagories of intrinsics:

1. Target specific intrinsics: these are things like llvm.returnaddress
   and the varargs stuff that are specific to code generators: there is
   simply no way to express this in LLVM without an intrinsic.  These
   could be made into LLVM instructions, but would provide no real
   advantage if we did so.
2. GC intrinsics: These are their own catagory because they require
   support for the LLVM code generator and GC runtime library to
   interpret.
3. Extended langauge intrinsics.  These include llvm.memcpy and friends.
   Note that the intrinsics are different (more powerful) than the libc
   equivalent, because they allow alignment info to be expressed.

> But then wouldn't malloc/free be intrinsics?

Heh, good question.  This is largely historical, but you're right, they
probably should have been made intrinsics.  Perhaps in the future they
will be converted to be intrinsics, but for now they remain instructions.

> 2. Stack and registers
>
> As the LLVM instruction set has a potentially infinite number of
> registers which are mapped onto target registers or the stack by the
> register allocator,

Yes.

> why is there a separate stack? I would understand it, if the stack was
> more accessible, as a way to implement closures, but it has been
> repeated here that the correct way to do that is to use heap-allocated
> structures, as functions can't access other functions' stacks. Is it to
> signal locations that need to be changed in-place?

I'm not sure what you mean.  In particular, the alloca instruction is used
to explicitly allocate stack space.  Because it is not possible to take
the address of LLVM registers, this the mechanism that we use to allocate
stack space.  Certain langauges do not need a stack, and thus do not need
to use alloca, other languages (e.g. C) do.  If you clarify your question
I'll be able to give a more satisfactory answer.

> 3. Control transfer
>
> Why are the control transfer operations so high level when compared to
> actual processors? Usually processors have instructions to jump to a
> concrete location and everything else is managed by the compiler (saving
> into the stack, getting result parameters, etc.) depending on the
> language's calling conventions.

The idea is to make the producer of the LLVM code as independent from the
target as possible.  In particular, a front-end for a type safe language
should be able to produce a type-safe module that works on all targets.

> In LLVM there's just one way to transfer control, and the only proposal
> I've seen
> (http://nondot.org/sabre/LLVMNotes/CustomCallingConventions.txt) keeps
> this high level. What are the difficulties in having low level transfer
> control operations, with explicitly managed arguments, saving registers,
> etc?

I'm not sure what it is that you are trying to do.  The abstraction
provided by the call instruction is good for the optimizers and analyses
(because they see exactly what they need, not extra details), good for
compilation speed, and good for target independence.  Custom calling
conventions (link above) are specifically designed to give the code
generator the flexibility to pick the most efficient calling convention
for a particular call/return pair.

Given all of the above (efficient compiled code, easy to write
analysis/xforms, happy front-ends, fast compile times), I'm not sure what
your approach would give.  Care to elaborate?

-Chris

-- 
http://llvm.org/
http://nondot.org/sabre/