[LLVMdev] libc dependencies, code generation questions

Thu Jun 7 02:22:11 PDT 2007

Hi Jonas,

> I'm looking into creating an llvm backend for the Free Pascal  
> Compiler (<http://www.freepascal.org>). After reading a bit through  
> the documentation and looking at some code generated by llvm-gcc, I  
> have a couple of questions:

I've been working on getting the gcc Ada front-end to work in llvm-gcc.
Since Ada evolved from Pascal, it may be worth your while to see how
llvm-gcc handles these kinds of issues (most of the work is done in
llvm-convert.cpp).

> 1) is there a way to specify ranges in the switch statement? Pascal  
> supports switch statements (called "case" statements there) which  
> look like this:
> 
> case <expr> of
>    1..1000000: dothis;
>    1000001..1000000000: do that;
> end;
> 
> Generating a switch statement with 10^9 individual entries is not  
> really feasible in practice. We can of course map all "large" ranges  
> in case statements into equivalent if-statements, but that largely  
> defeats the elegance and ease of use of the switch statement for us :)

Currently llvm-gcc maps large ranges into if-statements, see
TreeToLLVM::EmitSWITCH_EXPR.  Adding support for ranges to LLVM
is bug 1255.  It will doubtless happen because it doesn't seem
hard to do and in fact I understand that the switch lowering code
generates such ranges internally anyway.

> 2) I assume llvm sometimes adds implicit calls to functions in the C  
> library, e.g. for llvm.malloc, llvm.free, some floating point  
> routines and some others. Is there a policy regarding which llvm  
> opcodes may result in C library dependencies and which not? The  
> reason I ask is that we try to only depend on stable system  
> interfaces (in the sense of interfaces which are the most unlikely to  
> break backwards binary compatibility), and on a number of OSes (such  
> as Linux) this means using system calls rather than libc.
> 
> We have our own alternate implementations of all the functionality  
> expressed by the "high level" llvm opcodes, but I don't know if there  
> is a mechanism available to redirect these from their (presumed)  
> standard libc dependencies to our own routines.

I don't think LLVM spontaneously creates calls to these - it just knows how
to optimize them if the calls were already in the IR fed to it.

> 3) we support inline assembler in the same way that Turbo Pascal and  
> Delphi did: you just type in code without telling the compiler what  
> registers or memory locations this routine clobbers, and the compiler  
> thus cannot make any assumptions about them (other than what the ABI/ 
> calling convention specifies). As far as llvm is concerned, they  
> should be semantically equivalent to calling an external routine  
> which was not compiled to llvm ir. Is there generic a way to tell  
> this to llvm, or should one simply specify all volatile registers as  
> read and clobbered, and the same for memory?

I think you should just specify that everything is clobbered in the
inline asm you generate in the LLVM IR.

> 4) to what extent is the front end (i.e., our compiler) responsible  
> for code selection and optimization? In other words, should we spend  
> a lot of time on converting if-statements to select-based predicates  
> and things like this, or will this be done by llvm afterwards anyway?  
> What about vectorization? Are there particular kinds of optimizations  
> which llvm will probably never be very good at (or which are not  
> llvm's focus in the near to middle term), and which thus should  
> definitely be done at a higher level?

In llvm-gcc, front-ends do very little optimization.  Constant folding
occurs (gcc does this automagically as you create your gcc trees) and
some common idioms are recognized and output as something well adapted
to LLVM optimization.  But basically all the optimization is left to
LLVM.  LLVM can certainly turn if statements into switches.

Ciao,

Duncan.