[LLVMdev] PHP Zend LLVM extension (SoC)

Tue Apr 22 16:17:33 PDT 2008

Hi Nuno,

On Apr 22, 2008, at 18:44, Nuno Lopes wrote:

> PHP has a Google Summer of Code project approved to create an LLVM  
> extension for the PHP's VM (Zend). (http://code.google.com/soc/2008/php/appinfo.html?csaid=73D5F5E282F9163F 
> ). I'll be mentoring that project (and the student is CC'ed).  
> Although I've already contributed a few patches to clang, I haven't  
> hacked LLVM much, so I would like to gather some advise before  
> misleading the student too much :P

This is very exciting!

> So my idea is to use the current PHP parser to produce PHP bytecode  
> and then convert the PHP bytecode to LLVM's bitcode. The extra pass  
> to create PHP bytecode seems necessary for now, as it makes things  
> simpler in the PHP end. The first step would be to convert the PHP  
> bytecode to LLVM by just producing function calls to the PHP  
> interpreter opcode handlers. This has two advantages: it's a simple  
> task and we can put something working fast. The disadvantage is that  
> it would only bypass the opcode dispatcher, leaving no much room for  
> optimizations.

As far as I know, this is exactly how Apple's OpenGL shader JIT works  
in Mac OS X. Unfortunately, LLVM will rarely make dramatic changes to  
your memory representation, so this probably won't be as effective as  
it is in the OpenGL context. (LLVM will only do aggregate->scalar  
memory reorganizations; it probably won't be able to prove this safe  
for a dynamic language very often.) Your challenge in generating very- 
fast code would likely be one of type inference.

> In the second phase, we would start to inline some simple PHP  
> bytecodes, like arithmetic operations and so on, by dumping LLVM  
> assembly instead of calling the opcode handler. Eventually we could  
> reach a point that no opcode handlers are necessary.
>
> So does this looks like a sane thing? Any helpful advise? Other  
> question: After having the LLVM assembly, how should the binary code  
> be produced, loaded to memory, and then executed? I assume we can  
> link directly to the LLVM code generation and optimization libs. And  
> does it support dumping the code directly to the memory so that we  
> can run it from there without much magic (and then cache it  
> somewhere)?

You can use the facilities of ExecutionEngine to run code in-memory  
without ever touching the filesystem. The LLVM tutorial has  
information on how to do this.

http://llvm.org/doxygen/classllvm_1_1ExecutionEngine.html
http://llvm.org/docs/tutorial/LangImpl4.html

You'll probably want to provide your opcode handlers as an LLVM IR  
module. Your JIT can start up and “seed” the execution environment  
with the predefined handlers, then progressively incorporate more  
functions into the module as execution progresses.

Hope that helps,
Gordon