[LLVMdev] PHP Zend LLVM extension (SoC)

Thu Apr 24 01:08:13 PDT 2008

Hi Nuno,

this can be a great project. Some PHP opcodes can be optimised a lot by llvm 
(like branches or function calls) while others like operations on variables 
can't be so easy optimized due to the dynamic nature of PHP. For the latest 
maybe you can use some automatic type inference, like the ones used in 
languages like Haskell, but this is is a big project and there are also 
mixed cases like adding a number to a string. I think for these you can use 
for now the PHP handlers. Even so, I feel that the speed gain will be 
considerable.
Another thing you can do with only a little more work is to create an 
abstraction layer between the webserver module and the content source, 
abstraction layer which will work only with LLVM compiled files (.bc). In 
that scenario you can compile PHP files to LLVM .bc file format. These files 
can also be used as a cache, thus eliminating future parsing and compiling 
times. The speed gain can be very high, because for very much accessed sites 
some pages are needed hundreds of times per minute. The generated .bc files 
will call where needed the handlers from the PHP runtime and libraries.
On long term this abstraction layer, which in fact is a webserver module, 
can be used with many frontends which will generate  .bc code from different 
source languages (now Ruby, Python, Lua, etc comes into my mind), 
transforming all the thing into a framework similar with the ones based on 
.class or .NET cli formats. This of course can be done if the .bc format is 
mature and stable, else it can only be used as a cache.

Good luck,
Razvan

> Hi,
>
> PHP has a Google Summer of Code project approved to create an LLVM 
> extension
> for the PHP's VM (Zend).
> (http://code.google.com/soc/2008/php/appinfo.html?csaid=73D5F5E282F9163F).
> I'll be mentoring that project (and the student is CC'ed).
> Although I've already contributed a few patches to clang, I haven't hacked
> LLVM much, so I would like to gather some advise before misleading the
> student too much :P
>
> So my idea is to use the current PHP parser to produce PHP bytecode and 
> then
> convert the PHP bytecode to LLVM's bitcode. The extra pass to create PHP
> bytecode seems necessary for now, as it makes things simpler in the PHP 
> end.
> The first step would be to convert the PHP bytecode to LLVM by just
> producing function calls to the PHP interpreter opcode handlers. This has
> two advantages: it's a simple task and we can put something working fast.
> The disadvantage is that it would only bypass the opcode dispatcher, 
> leaving
> no much room for optimizations.
> In the second phase, we would start to inline some simple PHP bytecodes,
> like arithmetic operations and so on, by dumping LLVM assembly instead of
> calling the opcode handler. Eventually we could reach a point that no 
> opcode
> handlers are necessary.
>
> So does this looks like a sane thing? Any helpful advise?
> Other question: After having the LLVM assembly, how should the binary code
> be produced, loaded to memory, and then executed? I assume we can link
> directly to the LLVM code generation and optimization libs. And does it
> support dumping the code directly to the memory so that we can run it from
> there without much magic (and then cache it somewhere)?
>
>
> Thanks,
> Nuno
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev