[LLVMdev] Python bindings available.

Mon May 12 07:25:19 PDT 2008

On May 12, 2008, at 02:58, Mahadevan R wrote:

>> Consider the case where a function creates and populates a Module,  
>> stuffs it in an ExistingModuleProvider for the JIT, then returns  
>> the ModuleProvider, dropping direct reference to the Module.  
>> (ModuleProvider takes ownership of the Module.) I presume that your  
>> Python object is under the impression it owns the Module; when that  
>> goes out of scope, its refcount goes to zero and it invokes its  
>> dtor, disposing of the Module. D'oh— now the ModuleProvider has a  
>> dangling pointer. :)
>
> Ah. Good one. Would the following fix it?
>
> 1) Have ModuleProvider maintain a reference to the Module it owns,
>     so that the ref count is at least 1 at any time. This is easily  
> done.
>     The only thing left is when an MP goes away, the module's dtor
>     will be called first, deleting the module, then the MP's dtor will
>     be called, which will try to delete the same module again.
>
> 2a) So either we can prevent the actual destruction of modules that
>    are attached to MPs, or
>
> 2b) Do not do anything in the dtors of MPs (while letting the dtor of
>    modules do the work)

The ModuleProvider itself also needs to be destroyed or you have a  
memory leak.

> Both options have the disadvantage of assuming the C/C++  
> implementation (like MP::dtor deletes only the module and nothing  
> else).
>
>> The routine LLVMModuleRef LLVMGetGlobalParent(LLVMValueRef Global);  
>> poses a related problem; in this case, the returned reference is  
>> non-owning, so you must not dtor it from Python.
>
> If I do this:
>
>  m1 = Module.new()
>  g1 = m1.add_global_variable(ty, "name")
>  m2 = g1.module
>
> will the LLVMModuleRef pointer returned in the last call be the
> same as that of m1? If so probably we can get "g1.module" to
> return the original object itself.

It will, so that's a possibile.

>> The fix, of course, is providing a dispose routine and requiring  
>> the user to call it, since you can't know what they've done with  
>> the pointer.
>
> It'd be much easier to use it without an explicit destruction call.  
> I'd prefer to do it only if there's absolutely no other go.

It's certainly the easiest option. The thing to remember is that C++  
won't play with your garbage collector at all.

The only other option I can see is to

  1. Add an 'owns pointer?' flag to each Python wrapper object.
  2. Update that flag dynamically as the value is used, in accordance  
with the C++ ownership semantics.
  3. Maintain a LLVMFooRef -> Python object table.

So you would reset the 'owns pointer?' flag on m if the user calls  
ModuleProvider.new(m).

The table is necessary, else mp = ModuleProvider.new(m2) [using your  
example fragment above] would trigger a double free by mp and m1.  
Python users are probably not performance sensitive to routing lookups  
through the dictionary, so this might be quite workable.

But even with all that work, you'll still have the possibility of  
surprising results:

   def f(m):
     mp = ModuleProvider.new(m)
     # do something useful?

   m = Module.new("")
   f(m)
   print(m.name)  # Segfault, heap corruption, etc.

For Value and subclasses, you can perform the optimization that the  
Value is always owned by the Module and skip invoking all of this  
machinery. Likewise for Pass, if it ever gets bound.

— Gordon