[LLVMdev] GSoC 2012 Proposal: Python bindings for LLVM

Baozeng sploving1 at gmail.com
Wed Mar 28 07:31:46 PDT 2012


Hello all,
Here is my GSoC 2012 proposal: Python bindings for LLVM. Any feedback are
welcome!

*Title: Python bindings for LLVM*

*Abstract: * llvm-py provides Python bindings for LLVM. The latest llvm-py
supports bindings with Python 2.x version for LLVM 2.x. This project is to
improve llvm-py to make it compatible with both Python 2.x and Python 3 for
LLVM 3.

*Motivation*
LLVM is used as a static and dynamic JIT backends for many platforms. It
uses module-design idea and provides extensive optimization support.
llvm-py provides Python bindings for LLVM [1]. It began in 2008, which aims
to expose enough of LLVM APIs to implement a compiler backend in pure
Python. The latest llvm-py works only with LLVM 2.x, not LLVM 3. Since LLVM
3 has several major changes, especially the internal API changes, it is
necessary to improve llvm-py to work with LLVM 3. Also current llvm-py only
supports Python 2.x version, but not Python 3. By supporting Python 3, it
can make llvm-py more complete and thus LLVM can be used by more users,
which helps in its development. So this project is to finish the two tasks:
make llvm-py work with LLVM3 and add Python 3 support

*Project Detail*
 Before writing the proposal, I took a look at llvm-py source code, and had
a basic understanding how it works. I wrote a simple document to analysis
how it is implemented. (please see the appendix at the end of this
proposal).
In this section, I list some detail that related to this project. It
includes details about working with LLVM 3 and details about Python 3
support.

*1. Working with LLVM 3*
There are some internal API changes in LLVM 3. So the code of llvm-py
should be changed to consistent with these modified API.
a. IR Type system.  IR type system is reimplemented LLVM 3. For instance, *
OpaqueType* are gone. Such type should also be removed in llvm-py.
b. Value class. Two new sub classes of Value are added:
*ConstantDataArray*, an array constant
*ConstantDataVector*, a vector constant.
llvm-py should contain them.
c. Instruction class. Four new sub classes of Instruction are added:
*FenceInst*, an instruction for ordering other memory operations;
*AtomicCmpXchgInst*, an instruction that atomically checks and exchanges
values in a memory location;
*AtomicRMWInst*, an instruction that atomically reads a memory location,
combines it with another value and store the result back.
*LandingPadInst *, an instruction that hold the necessary information to
generate correct exception handling.
    llvm-py should support them.
d. Passes. Some passes are removed, for instance, *LowerSetJmp* pass. So
the API that is corresponding to them such as LLVMAddLowerSetJmpPass,
should also be removed in llvm-py.
e. PHINode. Two new functions are added in PHINode class: *block_begin* and
*block_end*. The list of incoming BasicBlocks can be accessed with these
functions. At the same time, reserveOperandSpace function is removed so
when creating a PHINode, an extra argument is needed to specify how many
operands to reserve space.
When making llvm-py work with LLVM 3.0, we should focus on these changes.
What I list above may not be complete. I will cover more changes during the
project.

*2. Python 3 support*
When adding support for Python 3, we also should pay attention to the C API
changes between Python 2.x and Python 3. Here I list some of them.
1. Extension module initialization and finalization (PEP 3121) [2]
In Python 3, the module initialization routines should look like this:
*PyObject *PyInit_<modulename>()*
When creating a module, a struct PyModuleDef should be passed as a
parameter.
2. Making PyObject_HEAD conform to standard C (PEP 3123) [3]
Some macros are added, for instance, *PY_TYPE, PY_REFCNT,PY_SIZE*. So a
code block *func->ob_type->tp_name* in Python 2.x should be replaced with *
PY_TYPE(func)->ty_nam*e in Python 3.
3. Byte vectors and String/Unicode Unification (PEP 0332) [4]
The *str* type and its accompanying support functions are gone and is
replaced with *byte* type.

When supporting Python 3 in llvm-py, we should focus on these C API
changes.

*Timeline*

 Before the coding period starts, I will analysis llvm-py source code
deeply, read LLVM 3 related documentation and code to speed up the project.

The coding period is divided into two stages: before midterm evaluation, I
would port llvm-py to LLVM 3. After the midterm, I would add Python 3
support on llvm-py.

May 21 ~ May 27 Support IR Type System for LLVM 3
May 28 ~ June 3 Support new Value sub classes and instruction sub classes
 June 4 ~ June 10 Deal with Pass Framework
 June 11 ~ June 17 Improve PHINode class support.
June 18 ~ June 24 Deal with other features, such as intrinsics.
 June 25 ~ July 1 Test and make LLVM 3 support in good shape.
 July 2~ July 8 Document for LLVM 3support for llvm-py
July 9 ~July 15 Midterm evaluation.
July 16~ July 22 Adding Python 3 support, make it basically work
July 23~ July 29 Debug and improve Python 3 support
July 30 ~ August 5 Test to make Python 3 support in good shape.
August 6 ~ August 12 Document for Python 3 support.

*Project experience*

In GSoC2009, I took part in a project: support Scilab language on SWIG [5].
I added a backend module in SWIG, so that it can support all the C features
for Scilab language: variables, functions, constants, enums, structs,
unions, pointers and arrays.

In GSoC2010, I also successfully finished a project called“epfs”[6] , which
means embedding Python from Scilab. This project introduces a mechanism to
load and use Python code from Scilab.

I have about one year’s experience for LLVM. I use it mainly to implement
control flow integrity for Operating Systems and thus improve system
security.
I recently submitted a patch for Target.h file to improve compatibility
with SWIG, which has been applied on the trunk.


*Biography*
Name: Baozeng Ding
University: Institute of Software, Chinese Academy of Science
Email: sploving1 at gmail.com
IRC name: sploving

*References*
[1]. http://code.google.com/p/llvm-py/
[2]. http://www.python.org/dev/peps/pep-3121/
[3]. http://www.python.org/dev/peps/pep-3123/
[4]. http://www.python.org/dev/peps/pep-0332/
[5]. http://code.google.com/p/google-summer-of-code-2009-swig/downloads/list
[6]. http://forge.scilab.org/index.php/p/epfs/

*Appendix*

*llvm-py Implementation
*

Here I give a small example to show the relationship between the Python
function in llvm-py and the C function in LLVM.

Let us analysis an example in llvm-py:

*f_sum = my_module.add_function(ty_func, "sum").*

How the above statement is implemented to call LLVM C function successfully?

The llvm-py package has six modules, of which the most important is the
core module, consisting of the following files:

*core.py *  high-level support code
*_core.c *  low-level wrapper code for LLVM Core libraries
*wrap.h *  It includes header files needed for the low-level wrapper code

In *core.py*, there is a class "Module", which has a method "add_function",
defined as the following:

*def add_function(self, ty, name):
       """Add a function of given type with given name."""
       return Function.new(self, ty, name)*

This method calls the constructor of class "*Function*" (Function.new). So
let’s take a look at what this constructor is? It is also defined in the
file *core.py* in llvm-py as the following:

*class Function(GlobalValue):

   @staticmethod
   def new(module, func_ty, name):
       check_is_module(module)
       check_is_type(func_ty)
       return _make_value(_core.LLVMAddFunction(module.ptr, name,
           func_ty.ptr))*

The most important statement in the above constructor is:

*_core.LLVMAddFunction(module.ptr, name, func_ty.ptr) *

If you are familiar with C extensions for Python, you could guess that
LLVMAddFunction should be defined in the low-level wrapper file *_core.c*.
Let's find out how it is defined in this wrapper file?
In *_core.c*, the following statements are what we are looking for.

*static PyMethodDef core_methods[] = {
 ...

  /* Functions */
   _method( LLVMAddFunction )
  ...
}*

LLVMAddFunction is defined as a macro. Let's look at what the macro _method
mean? It is defined in _core.c:

*#define _method( func )     { # func , _w ## func , METH_VARARGS },*

In the above macro, func is the name used in python, and _w ## func is the
corresponding name of the wrapper function. ie, When we call a function
func in python, it intrinsically calls the wrapper C funtcion _w ## func.
So when we use LLVMAddFunction methoed in python, it actually calls
_wLLVMAddFunction. Then how is _wLLVMAddFunction defined?

Also in *_core.c* file, there is such a statement that is related to
LLVMAddFunction:

*_wrap_objstrobj2obj(LLVMAddFunction, LLVMModuleRef, LLVMTypeRef,
LLVMValueRef)   *

This macro is defined in wrap.h file:

*/**
* Wrap LLVM functions of the type
* outtype func(intype1 arg1, const char *arg2, intype3 arg3)
*/
#define _wrap_objstrobj2obj(func, intype1, intype3, outtype)    \
static PyObject *                                       \
_w ## func (PyObject *self, PyObject *args)             \
{                                                       \
   PyObject *obj1, *obj3;                              \
   intype1 arg1;                                       \
   const char *arg2;                                   \
   intype3 arg3;                                       \
                                                       \
   if (!PyArg_ParseTuple(args, "OsO", &obj1, &arg2, &obj3))   \
       return NULL;                                    \
                                                       \
   arg1 = ( intype1 ) PyCObject_AsVoidPtr(obj1);       \
   arg3 = ( intype3 ) PyCObject_AsVoidPtr(obj3);       \
                                                       \
   return ctor_ ## outtype ( func (arg1, arg2, arg3)); \
}*

So the above statement undergoes macro expansion to be:

*_wLLVMAddFunction (PyObject *self, PyObject *args)  //This is what we are
looking for!
{
   PyObject *obj1, *obj3;
   LLVMModuleRef arg1;
   const char *arg2;
   LLVMTypeRef arg3;

   if (!PyArg_ParseTuple(args, "OsO", &obj1, &arg2, &obj3))
       return NULL;

   arg1 = ( LLVMModuleRef ) PyCObject_AsVoidPtr(obj1);
   arg3 = ( LLVMTypeRef) PyCObject_AsVoidPtr(obj3);

   return ctor_LLVMValueRef( LLVMAddFunction (arg1, arg2, arg3));
}
 *
We get the function* _wLLVMAddFunction* that we are looking for. As is show
in the last statement of this function:

*return ctor_LLVMValueRef( LLVMAddFunction (arg1, arg2, arg3));*

we finally get the C function that my_module.add_function in the example
calls : *LLVMAddFunction*, which is defined in the file *core.h *of LLVM
libries.

*LLVMValueRef LLVMAddFunction(LLVMModuleRef M, const char *Name,
LLVMTypeRef FunctionTy);*







-- 
     Best Regards,
                                                                 Baozeng
Ding

OSTG,NFS,ISCAS
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20120328/edbcc73c/attachment.html>


More information about the llvm-dev mailing list