[LLVMdev] [Stackless] [C++-sig] [Boost] Trouble optimizing Boost.Python integration for game development (it seems too slow)

OvermindDL1 overminddl1 at gmail.com
Wed Aug 26 04:20:19 PDT 2009


On Wed, Aug 26, 2009 at 4:41 AM, Dan Sanduleac<sanduleac.dan at gmail.com> wrote:
> Oh, I see. Didn't think of this, thanks!
>
> So, just to be clear, there's no binding overhead in Cython because the
> functions defined there are pure python, right? (The function objects I
> mean). Whereas the ones I defined in Boost are more expensive to call.
>
> On Wed, Aug 26, 2009 at 1:02 PM, OvermindDL1 <overminddl1 at gmail.com> wrote:
>>
>> On Wed, Aug 26, 2009 at 3:31 AM, Dan Sanduleac<sanduleac.dan at gmail.com>
>> wrote:
>> > Hi,
>> >
>> > I'm trying to compare different Python-C wrapping techniques to see
>> > which
>> > would be faster and also more suited to game development.
>> > I'm using Stackless Python 2.6.2 at 74550 / GCC 4.3.3, and boost 1.39.0 on
>> > Ubuntu 9.04. I implemented a simple Vec3 (3D vector) class in C++ and
>> > wrapped it with boost::python. All it does is multiplications and
>> > additions,
>> > so it implements just two operators for python.
>> > The thing is, it proves to be kind of slow compared to an equivalent
>> > Cython/Pyrex code. I think it should run faster than Cython code.
>> > (Note: Cython is not an abbreviation for C/Python API)
>> > I compiled the python library from Boost, in release mode, and then
>> > linked
>> > the vec3 module, whose code is provided below, to the compiled
>> > boost::python
>> > library. (I used -O2 when compiling the vec3 module)
>> >
>> > The testing goes like this: each "tick", 10000 objects update their
>> > position, according to their velocity and timedelta since last "tick",
>> > and
>> > I'm measuring the average time a tick takes to complete.
>> > On my machine doing this with Cython takes ~0.026 sec/tick, while doing
>> > it
>> > with boost.python takes like 0.052 sec/tick
>> > (The overhead introduced by python's iterating through the list of
>> > objects
>> > each tick is about 0.01 sec)
>> > During one tick, for each object, python runs this: "self.position +=
>> > self.velocity * time_delta",
>> > where position and velocity are instances of Vec3.
>> >
>> > I was hoping for better results than with Cython, by using Boost. Am I
>> > doing
>> > something wrong?
>> >
>> >
>> >
>> > Source code:
>> > vec3.cpp
>> > ==========
>> > #include <boost/python.hpp>
>> > using namespace boost::python;
>> >
>> > class Vec3 {
>> >
>> >   float x, y, z;
>> >
>> > public:
>> >   Vec3(float x, float y, float z);
>> >     Vec3 &operator*=(float scalar);
>> >   Vec3 operator*(float scalar) const;
>> >   Vec3 &operator+=(const Vec3 &who);
>> >   // that `const Vec3` is REALLY needed, unless you want error monsoon
>> > to
>> > come down
>> > };
>> >
>> > // === boost:python wrapper ===
>> > // publish just += and * to python
>> >
>> > BOOST_PYTHON_MODULE(vec3)
>> > {
>> >   class_<Vec3>("Vec3", init<float, float, float>())
>> >       .def(self += self)
>> >       .def(self * float())
>> >   ;
>> > }
>> >
>> > // === implementation ===
>> >
>> > Vec3::Vec3(float x, float y, float z) {
>> >   this->x = x;
>> >   this->y = y;
>> >   this->z = z;
>> > }
>> >
>> > Vec3 & Vec3::operator*=(float scalar) {
>> >   this->x *= scalar;
>> >   this->y *= scalar;
>> >   thiz->z *= scalar;a
>> > }
>> >
>> > Vec3 Vec3::operator*(float scalar) const {
>> >   return Vec3(*this) *= scalar;
>> > }
>> >
>> > Vec3 & Vec3::operator+=(const Vec3 &who) {
>> >   this->x += who.x;
>> >   this->y += who.y;
>> >   this->z += who.z;
>> >   return *this;
>> > }
>> >
>> > ==============================================
>> > vec3.pyx (cython code, for reference)
>> > ===========================
>> > cdef class Vec3:
>> >
>> >     cdef readonly double x, y, z
>> >
>> >     def __cinit__(Vec3 self, double x, double y, double z):
>> >         self.x, self.y, self.z = x, y, z
>> >
>> >     # operator *
>> >     def __mul__(Vec3 self, double arg):
>> >         return Vec3(self.x*arg, self.y*arg, self.z*arg)
>> >
>> >     # operator +=
>> >     def __iadd__(Vec3 self, Vec3 arg):
>> >         #if not isinstance(arg, Vec3):
>> >         #    return NotImplemented
>> >         self.x += arg.x
>> >         self.y += arg.y
>> >         self.z += arg.z
>> >         return self
>>
>> That is because Boost.Python is designed to be an easy to use, safe,
>> and powerful binder, not fast.  It is good to bind things that are
>> long lived, not something as quick as simple operations like an
>> addition and so forth, the overhead of the call will be so much higher
>> then the actual operation.  Something like Cython does not have that
>> limitation, although it produced slower code, the binding overhead
>> does not exist.

Boost.Python's binding layer includes an exception mechanism (that
converts between C++ and Python exceptions), and a register system (to
ensure everything is type-safe), those both have some overhead that
more pure calls will not have.  Basically, if something that is called
through Boost.Python has a longer lifespan, it is good to use,
certainly easier, but if it is only a short call like the above, it is
not the best thing to use.  Everything has its place.  :)




More information about the llvm-dev mailing list