[LLVMdev] Another LLVM JIT extension to Python

Sat Jun 30 01:41:01 PDT 2012

On 06/29/2012 07:44 PM, Siu Kwan Lam wrote:
> On 06/29/2012 02:47 AM, Tobias Grosser wrote:
>> On 06/29/2012 01:06 AM, Siu Kwan Lam wrote:
>>> Dear LLVM,
>>>
>>> I am a young developer who have just uploaded my first opensource
>>> project based on LLVM. I would like to know what professionals think of
>>> my project.
>>>
>>> I have started a JIT extension to Python called Pymothoa (
>>> http://code.google.com/p/pymothoa/). Unlike other similar projects, I
>>> did not modify the interpreter. Pymothoa uses Python decorators to mark
>>> function for JIT compiling. It uses the AST generated by Python;
>>> thereby, it uses the same syntax of Python but works like a low-level
>>> programming language (like C). The goal is to get speedup in
>>> compute-intensive code without writing C-extensions.
>>>
>>> If you are interested, there are two demo applications in the source
>>> tree: matrix-matrix multiplication and reduce-sum. I would appreciate
>>> any comment.
>>>
>>> Siu Kwan Lam
>>
>> Hi Siu Kwan Lam,
>>
>> that looks very interesting! It is very nice to see how easy it is to
>> install and how easy it is to add proper function annotations. Also,
>> the generated source code seems to be a good start. It would be
>> interesting to try it with Polly [1]. I believe that this could give
>> great speedups for the naive matrix multiply implementation.
>> Is there a way I can dump the content of the entire LLVM-IR module
>> generated in the demo/matrixmul/matrixmul.py example?
>>
>> Cheers
>> Tobi
>>
>> [1] http://polly.llvm.org
>>
> Hi Tobi,
>
> Thank you for your feedback. I will be looking at Polly for better
> locality optimization. Can I simply include Polly as optimization
> passes? If so, the pymothoa/llvm_backend/default_passes.py can be easily
> edited to add new passes. I am still trying to figure out what to
> include for the optimization pass for the best result.

You need to load the Polly.so object file. After the file is loaded, all 
Polly passes are automatically available. To load them you have two options:

1) Add them to the pass list

This is a rather long list of additional passes. The passes we add can 
be seen in lib/RegisterPasses.cpp (you also need the preparing 
transformations)

2) You use the pass manager builder

Look at llvm/Transforms/IPO/PassManagerBuilder.h with 
PassManagerBuilder::populateFunctionPassManager(). At -O3 and with 
enabling the -polly command line option (no idea how that would work), 
the Polly passes are part of the normal -O3 passes.

>> Is there a way I can dump the content of the entire LLVM-IR module
>> generated in the demo/matrixmul/matrixmul.py example?
> You can do so by printing the default_module:
>
> print default_module

Perfect, that's what I was looking for.

> You may want to do so before optimizing with "default_module.optimize()"
> to see what my codegen is doing.
>
> I will be adding more documentation to the project wiki.
Great.

I just looked at the generated code. Polly can not directly optimize it, 
but I don't see any fundamental problems. In fact the code looks really 
nice. The main issues I have seen here are:

1. The array references could alias

The arguments of the function matrixmul_naive can alias, which not only 
blocks Polly from working right now, but will also make other LLVM 
transformations less effective. If you can guarantee that the arguments 
do not alias, the best would be to add the parameter attribute [1] 
'noalias' to those parameters.

2. No target data set

The LLVM-IR module you are generating does not have any target data 
string set. When trying my optimizations I set manually something like:

target datalayout = 
"e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64"
target triple = "x86_64-unknown-linux-gnu"

This again is something that will both help generic optimizations, as 
well as Polly.

3. Variable size arrays

You are using variable size arrays in the code you generate. This is 
perfectly fine, but currently not supported by Polly. As a workaround, 
setting n = 1024 at the beginning of the code is enough to make Polly 
work. The right solution is obviously to add variable length array 
support to Polly.

4. Pass ordering issue

The pass order 'opt -O3 -polly' uses is not good enough to detect your 
code. Using 'opt -O3 | opt -O3 -polly' works. This means, we probably 
need to schedule one or two additional canonicalization passes. One 
reason for this may be, that you 'alloc' data elements in the body of a 
function. Many LLVM passes put the alloc instructions always in the very 
first basic block. You may consider doing the same, when doing code 
generation.

Again, thanks for this very nice cool. I am looking forward to play more 
with it.

Cheers
Tobi

[1] http://llvm.org/docs/LangRef.html#paramattrs