[LLVMdev] Some question on LLVM design

Sat Oct 23 11:11:39 PDT 2004

On Fri, 2004-10-22 at 06:18, Marc Ordinas i Llopis wrote:
> Hi everybody,
> 
Hi Marc

> I'm currently looking at LLVM as a possible back-end to a dynamic 
> programming system (in the tradition of Smalltalk) we are developing. 

Great!

> I 
> have read most of the llvmdev archives, and I'm aware that some things 
> are 'planned' but not implemented yet. We are willing to contribute the 
> code we'll need for our project, but before I can start coding I'll have 
> to submit to my boss a very concrete proposal on which changes I'll make 
> and how long they're going to take.
> 
> So before I can present a concrete proposal, I have some doubts on the 
> design of LLVM and on how some particular constructs should be mapped 
> onto its bytecode representation. Please understand that these questions 
> are not intended to criticize LLVM, but instead to better my 
> understanding of it.

Okay, I'll take a crack at it. Others will probably want to give you
better answers than mine :)

> 
> 1. Opcodes and intrinsics
> 
> Which are the differences between opcodes and intrinsics? How is it 
> determined, for an operation, to implement it as an opcode or as an 
> intrinsic function?

The opcodes are generally fixed as they represent the LLVM mid-level IR.
Changing the opcode set can have wide-reaching impact on all the
analysis, transform, and codegen passes in LLVM so its not a change
taken lightly. However, when it makes sense, they are added
occasionally. For example, we recently added the "unreachable"
instruction which allows a front end compiler to identify code locations
that should not be reached. This can help some of the passes. 

As for intrinsics, these are basically function calls that LLVM knows
about. For example, things like memset and memcpy could be implemented
as a function but could also be implemented with direct code if the
processor supports it. Intrinsics are place holders for either code
generation or invocation of a runtime library function. This is probably
where you'd want to extend.

> As I understand it, compilation passes can both lower intrinsics into 
> opcodes and also replace opcode sequences, so in the end some of them 
> are interchangeable. For example, why is there a store opcode and a 
> llvm_gcwrite intrinsic? Couldn't the front-end just produce 
> stores/volatile stores and then a compilation pass transform them into a 
> write-barrier if necessary?

I believe the llvm_gcwrite intrinsic is for garbage collection, which is
entirely optional. If all stores were to have write-barrier semantics
there could be significant performance penalties.

> A possible view of intrinsics could be "operations that don't depend on 
> the target architecture, but instead on the language runtime". But then 
> wouldn't malloc/free be intrinsics?

Intrinsics are intended to be replaceable by the target's code
generation, if possible/necessary, or emulated with a function call if
not. Language runtimes should be just that .. calls to functions located
in libraries.

> 
> 2. Stack and registers
> 
> As the LLVM instruction set has a potentially infinite number of 
> registers which are mapped onto target registers or the stack by the 
> register allocator, why is there a separate stack? I would understand 
> it, if the stack was more accessible, as a way to implement closures, 
> but it has been repeated here that the correct way to do that is to use 
> heap-allocated structures, as functions can't access other functions' 
> stacks. Is it to signal locations that need to be changed in-place?

I'm not sure what you mean by a "separate stack". Register allocation
can spill registers to THE stack. I'll let someone more knowledgeable
about this answer.

> 3. Control transfer
> 
> Why are the control transfer operations so high level when compared to 
> actual processors? Usually processors have instructions to jump to a 
> concrete location and everything else is managed by the compiler (saving 
> into the stack, getting result parameters, etc.) depending on the 
> language's calling conventions. In LLVM there's just one way to transfer 
> control, and the only proposal I've seen 
> (http://nondot.org/sabre/LLVMNotes/CustomCallingConventions.txt) keeps 
> this high level. What are the difficulties in having low level transfer 
> control operations, with explicitly managed arguments, saving registers, 
> etc?

I don't think there are any difficulties in having low-level control
transfer operations, I think its more that we don't want them. The point
behind the mid-level SSA IR is to make it simple to generate code from a
compiler front end and to not restrict choices that could be made by
analysis, transform, and codegen passes. The fact that we have a
relatively high level for control transfer operations goes right along
with the rest of the LLVM IR (e.g. gep, call, typed operators, malloc,
free). 

The point is to make a simple, consistent, small IR with the following
goals:

* front ends can easily generate code for it because the number of
  instructions is small and model is simple.
* analyses and transforms are simplified because they don't have to
  reason over mind-numbing complexity of lower-level instruction sets
* code generation has significant freedom in the code it generates

> 
> Well, that's all for now. Thanks in advance,
> 
> Marc Ordinas i Llopis | Tragnarion Studios

Thanks for the interesting questions!

Reid.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20041023/fc549a32/attachment.sig>