[LLVMdev] GSoC 2012 Proposal: Python bindings for LLVM
Baozeng
sploving1 at gmail.com
Wed Mar 28 07:31:46 PDT 2012
Hello all,
Here is my GSoC 2012 proposal: Python bindings for LLVM. Any feedback are
welcome!
*Title: Python bindings for LLVM*
*Abstract: * llvm-py provides Python bindings for LLVM. The latest llvm-py
supports bindings with Python 2.x version for LLVM 2.x. This project is to
improve llvm-py to make it compatible with both Python 2.x and Python 3 for
LLVM 3.
*Motivation*
LLVM is used as a static and dynamic JIT backends for many platforms. It
uses module-design idea and provides extensive optimization support.
llvm-py provides Python bindings for LLVM [1]. It began in 2008, which aims
to expose enough of LLVM APIs to implement a compiler backend in pure
Python. The latest llvm-py works only with LLVM 2.x, not LLVM 3. Since LLVM
3 has several major changes, especially the internal API changes, it is
necessary to improve llvm-py to work with LLVM 3. Also current llvm-py only
supports Python 2.x version, but not Python 3. By supporting Python 3, it
can make llvm-py more complete and thus LLVM can be used by more users,
which helps in its development. So this project is to finish the two tasks:
make llvm-py work with LLVM3 and add Python 3 support
*Project Detail*
Before writing the proposal, I took a look at llvm-py source code, and had
a basic understanding how it works. I wrote a simple document to analysis
how it is implemented. (please see the appendix at the end of this
proposal).
In this section, I list some detail that related to this project. It
includes details about working with LLVM 3 and details about Python 3
support.
*1. Working with LLVM 3*
There are some internal API changes in LLVM 3. So the code of llvm-py
should be changed to consistent with these modified API.
a. IR Type system. IR type system is reimplemented LLVM 3. For instance, *
OpaqueType* are gone. Such type should also be removed in llvm-py.
b. Value class. Two new sub classes of Value are added:
*ConstantDataArray*, an array constant
*ConstantDataVector*, a vector constant.
llvm-py should contain them.
c. Instruction class. Four new sub classes of Instruction are added:
*FenceInst*, an instruction for ordering other memory operations;
*AtomicCmpXchgInst*, an instruction that atomically checks and exchanges
values in a memory location;
*AtomicRMWInst*, an instruction that atomically reads a memory location,
combines it with another value and store the result back.
*LandingPadInst *, an instruction that hold the necessary information to
generate correct exception handling.
llvm-py should support them.
d. Passes. Some passes are removed, for instance, *LowerSetJmp* pass. So
the API that is corresponding to them such as LLVMAddLowerSetJmpPass,
should also be removed in llvm-py.
e. PHINode. Two new functions are added in PHINode class: *block_begin* and
*block_end*. The list of incoming BasicBlocks can be accessed with these
functions. At the same time, reserveOperandSpace function is removed so
when creating a PHINode, an extra argument is needed to specify how many
operands to reserve space.
When making llvm-py work with LLVM 3.0, we should focus on these changes.
What I list above may not be complete. I will cover more changes during the
project.
*2. Python 3 support*
When adding support for Python 3, we also should pay attention to the C API
changes between Python 2.x and Python 3. Here I list some of them.
1. Extension module initialization and finalization (PEP 3121) [2]
In Python 3, the module initialization routines should look like this:
*PyObject *PyInit_<modulename>()*
When creating a module, a struct PyModuleDef should be passed as a
parameter.
2. Making PyObject_HEAD conform to standard C (PEP 3123) [3]
Some macros are added, for instance, *PY_TYPE, PY_REFCNT,PY_SIZE*. So a
code block *func->ob_type->tp_name* in Python 2.x should be replaced with *
PY_TYPE(func)->ty_nam*e in Python 3.
3. Byte vectors and String/Unicode Unification (PEP 0332) [4]
The *str* type and its accompanying support functions are gone and is
replaced with *byte* type.
When supporting Python 3 in llvm-py, we should focus on these C API
changes.
*Timeline*
Before the coding period starts, I will analysis llvm-py source code
deeply, read LLVM 3 related documentation and code to speed up the project.
The coding period is divided into two stages: before midterm evaluation, I
would port llvm-py to LLVM 3. After the midterm, I would add Python 3
support on llvm-py.
May 21 ~ May 27 Support IR Type System for LLVM 3
May 28 ~ June 3 Support new Value sub classes and instruction sub classes
June 4 ~ June 10 Deal with Pass Framework
June 11 ~ June 17 Improve PHINode class support.
June 18 ~ June 24 Deal with other features, such as intrinsics.
June 25 ~ July 1 Test and make LLVM 3 support in good shape.
July 2~ July 8 Document for LLVM 3support for llvm-py
July 9 ~July 15 Midterm evaluation.
July 16~ July 22 Adding Python 3 support, make it basically work
July 23~ July 29 Debug and improve Python 3 support
July 30 ~ August 5 Test to make Python 3 support in good shape.
August 6 ~ August 12 Document for Python 3 support.
*Project experience*
In GSoC2009, I took part in a project: support Scilab language on SWIG [5].
I added a backend module in SWIG, so that it can support all the C features
for Scilab language: variables, functions, constants, enums, structs,
unions, pointers and arrays.
In GSoC2010, I also successfully finished a project called“epfs”[6] , which
means embedding Python from Scilab. This project introduces a mechanism to
load and use Python code from Scilab.
I have about one year’s experience for LLVM. I use it mainly to implement
control flow integrity for Operating Systems and thus improve system
security.
I recently submitted a patch for Target.h file to improve compatibility
with SWIG, which has been applied on the trunk.
*Biography*
Name: Baozeng Ding
University: Institute of Software, Chinese Academy of Science
Email: sploving1 at gmail.com
IRC name: sploving
*References*
[1]. http://code.google.com/p/llvm-py/
[2]. http://www.python.org/dev/peps/pep-3121/
[3]. http://www.python.org/dev/peps/pep-3123/
[4]. http://www.python.org/dev/peps/pep-0332/
[5]. http://code.google.com/p/google-summer-of-code-2009-swig/downloads/list
[6]. http://forge.scilab.org/index.php/p/epfs/
*Appendix*
*llvm-py Implementation
*
Here I give a small example to show the relationship between the Python
function in llvm-py and the C function in LLVM.
Let us analysis an example in llvm-py:
*f_sum = my_module.add_function(ty_func, "sum").*
How the above statement is implemented to call LLVM C function successfully?
The llvm-py package has six modules, of which the most important is the
core module, consisting of the following files:
*core.py * high-level support code
*_core.c * low-level wrapper code for LLVM Core libraries
*wrap.h * It includes header files needed for the low-level wrapper code
In *core.py*, there is a class "Module", which has a method "add_function",
defined as the following:
*def add_function(self, ty, name):
"""Add a function of given type with given name."""
return Function.new(self, ty, name)*
This method calls the constructor of class "*Function*" (Function.new). So
let’s take a look at what this constructor is? It is also defined in the
file *core.py* in llvm-py as the following:
*class Function(GlobalValue):
@staticmethod
def new(module, func_ty, name):
check_is_module(module)
check_is_type(func_ty)
return _make_value(_core.LLVMAddFunction(module.ptr, name,
func_ty.ptr))*
The most important statement in the above constructor is:
*_core.LLVMAddFunction(module.ptr, name, func_ty.ptr) *
If you are familiar with C extensions for Python, you could guess that
LLVMAddFunction should be defined in the low-level wrapper file *_core.c*.
Let's find out how it is defined in this wrapper file?
In *_core.c*, the following statements are what we are looking for.
*static PyMethodDef core_methods[] = {
...
/* Functions */
_method( LLVMAddFunction )
...
}*
LLVMAddFunction is defined as a macro. Let's look at what the macro _method
mean? It is defined in _core.c:
*#define _method( func ) { # func , _w ## func , METH_VARARGS },*
In the above macro, func is the name used in python, and _w ## func is the
corresponding name of the wrapper function. ie, When we call a function
func in python, it intrinsically calls the wrapper C funtcion _w ## func.
So when we use LLVMAddFunction methoed in python, it actually calls
_wLLVMAddFunction. Then how is _wLLVMAddFunction defined?
Also in *_core.c* file, there is such a statement that is related to
LLVMAddFunction:
*_wrap_objstrobj2obj(LLVMAddFunction, LLVMModuleRef, LLVMTypeRef,
LLVMValueRef) *
This macro is defined in wrap.h file:
*/**
* Wrap LLVM functions of the type
* outtype func(intype1 arg1, const char *arg2, intype3 arg3)
*/
#define _wrap_objstrobj2obj(func, intype1, intype3, outtype) \
static PyObject * \
_w ## func (PyObject *self, PyObject *args) \
{ \
PyObject *obj1, *obj3; \
intype1 arg1; \
const char *arg2; \
intype3 arg3; \
\
if (!PyArg_ParseTuple(args, "OsO", &obj1, &arg2, &obj3)) \
return NULL; \
\
arg1 = ( intype1 ) PyCObject_AsVoidPtr(obj1); \
arg3 = ( intype3 ) PyCObject_AsVoidPtr(obj3); \
\
return ctor_ ## outtype ( func (arg1, arg2, arg3)); \
}*
So the above statement undergoes macro expansion to be:
*_wLLVMAddFunction (PyObject *self, PyObject *args) //This is what we are
looking for!
{
PyObject *obj1, *obj3;
LLVMModuleRef arg1;
const char *arg2;
LLVMTypeRef arg3;
if (!PyArg_ParseTuple(args, "OsO", &obj1, &arg2, &obj3))
return NULL;
arg1 = ( LLVMModuleRef ) PyCObject_AsVoidPtr(obj1);
arg3 = ( LLVMTypeRef) PyCObject_AsVoidPtr(obj3);
return ctor_LLVMValueRef( LLVMAddFunction (arg1, arg2, arg3));
}
*
We get the function* _wLLVMAddFunction* that we are looking for. As is show
in the last statement of this function:
*return ctor_LLVMValueRef( LLVMAddFunction (arg1, arg2, arg3));*
we finally get the C function that my_module.add_function in the example
calls : *LLVMAddFunction*, which is defined in the file *core.h *of LLVM
libries.
*LLVMValueRef LLVMAddFunction(LLVMModuleRef M, const char *Name,
LLVMTypeRef FunctionTy);*
--
Best Regards,
Baozeng
Ding
OSTG,NFS,ISCAS
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20120328/edbcc73c/attachment.html>
More information about the llvm-dev
mailing list