[LLVMdev] me being stupid: me vs the llvm codebase...

Tue Oct 23 08:45:48 PDT 2007

On Oct 23, 2007, at 05:52, BGB wrote:

> I am assuming then that some external assembler is used (such as  
> 'gas')?...

In the static compilers, yes. The JIT directly serializes  
instructions into memory without the aid of an external assembler.  
There are also experimental built-in assemblers; LLVM calls them  
object writers[1].

> it looks like much of the interconnection and data sharing is done  
> through objects and templates?...

That's correct. The LLVM intermediate representation (IR) is well- 
suited for many transformations and analyses, which are generally  
structured as passes[2]. The LLVM IR has both object-oriented[3],  
textual (.ll) [4], and binary (.bc "bitcode") [5] representations;  
all are fully equivalent. However, it is more efficient not to wring  
the program through multiple print/parse or write/read cycles, so the  
object-oriented representation is generally maintained within any  
single process.

The code generators also convert the program into the SelectionDAG  
and MachineFunction forms, both of which are target-independent in  
form but not in content.[6] Each of these forms have multiple states  
with differing invariants. (Strictly speaking, however, these forms  
are private to each code generator; the C backend does not use  
either.) These code generation forms do not have first-class textual  
or binary representations, since they are ephemeral data structures  
used only during code generation. They can however be dumped to human- 
readable text, or viewed with GraphVis.

> doesn't appear much like working like a dynamic compiler is a major  
> design goal (so I hear, it can be used this way, but this is not  
> the focus).
>
> so, it looks like the design focuses mostly of taking the input  
> modules, grinding it and mixing it, and doing lots of spify inter- 
> module optimizations (presumably forming a monolithic output  
> representing the entire project?...).

LLVM does work well as a static (offline) compiler, where inter- 
procedural optimization and link-time optimization are useful. In  
llvm-gcc, link-time optimization ("mixing" as you say) only occurs  
with at -O4. Typically, IPO is performed only within a single  
compilation unit (-O3/-O2). No IPO is performed at -O0.

> as a result, my compiler generally refrains from inlining things or  
> doing brittle inter-function optimizations (after all, one could  
> potentially relink parts of the image and break things...).

It's possible to use LLVM in the same manner by simply refraining  
from the use of inter-procedural optimizations.

If LLVM bytecode is used as the on-disk representation, however, LLVM  
would allow the use of offline optimizations before starting the JIT  
program. This could include IPO or LTO at the developer's option, and  
would be entirely safe if the unit of dynamism were restricted to an  
LLVM module, since LTO merges modules together.

> how well would LLVM work for being used in a manner comprable to  
> LISP-style eval (or Self, Smalltalk, or Python style incremental  
> restructuring)?...

Simply codegen the string into a function at runtime, JIT it, and  
call it.[7] Afterwards, the IR and the machine code representation  
can be deleted.

> and incrementally replacing functions or modules at runtime?...

Generally speaking, LLVM neither helps nor hinders here. Maybe  
someone will follow up with whether the JIT uses stub functions which  
would enable dynamic relinking If not, it would be a straightforward,  
if platform-specific, feature to add.

— Gordon

[1]
http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/CodeGen/ 
ELFWriter.cpp?view=markup
http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/CodeGen/ 
MachOWriter.cpp?view=markup

[2] http://llvm.org/docs/WritingAnLLVMPass.html

[3] http://llvm.org/docs/ProgrammersManual.html#coreclasses

[4] http://llvm.org/docs/LangRef.html

[5] http://llvm.org/docs/BitCodeFormat.html

[6] http://llvm.org/docs/CodeGenerator.html

[7] watch this space, currently under rapid construction: http:// 
llvm.org/docs/tutorial/
In particular, observe the HandleTopLevelExpression function in §3.3  
"Implementing Code Generation to LLVM IR." That function will be  
extended to handle the eval usage in §3.4 "Adding JIT and Optimizer  
Support."