[cfe-commits] [PATCH 1/4] [clang.py] Store reference to TranslationUnit in Cursor and Type

Thu May 17 02:14:32 PDT 2012

On Thu, May 17, 2012 at 10:33 AM, Manuel Klimek <klimek at google.com> wrote:
> On Thu, May 17, 2012 at 10:27 AM, Gregory Szorc <gregory.szorc at gmail.com> wrote:
>> On Wed, May 16, 2012 at 11:12 AM, Manuel Klimek <klimek at google.com> wrote:
>>> On Tue, May 15, 2012 at 11:54 PM, Gregory Szorc <gregory.szorc at gmail.com> wrote:
>>>> On Tue, May 15, 2012 at 12:55 PM, Manuel Klimek <klimek at google.com> wrote:
>>>>> On Tue, May 15, 2012 at 9:52 PM, Gregory Szorc <gregory.szorc at gmail.com> wrote:
>>>>>> On Tue, May 15, 2012 at 12:45 PM, Manuel Klimek <klimek at google.com> wrote:
>>>>>>> On Tue, May 15, 2012 at 9:40 PM, Gregory Szorc <gregory.szorc at gmail.com> wrote:
>>>>>>>> On Tue, May 15, 2012 at 12:31 PM, Manuel Klimek <klimek at google.com> wrote:
>>>>>>>>> On Mon, May 14, 2012 at 10:15 AM, Manuel Klimek <klimek at google.com> wrote:
>>>>>>>>>> On Mon, May 14, 2012 at 5:46 AM, Gregory Szorc <gregory.szorc at gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>> On 5/13/12 1:00 AM, Manuel Klimek wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> +    @property
>>>>>>>>>>>> +    def translation_unit(self):
>>>>>>>>>>>> +        """Returns the TranslationUnit to which this Cursor belongs."""
>>>>>>>>>>>> +        return getattr(self, '_tu', None)
>>>>>>>>>>>>
>>>>>>>>>>>> What's the reason for the default value? Do we expect that people create
>>>>>>>>>>>> cursers via the lowlevel calls? Why would just return tu_ not work?
>>>>>>>>>>>
>>>>>>>>>>> Just defensive programming. If you return self._tu, that may raise an
>>>>>>>>>>> AttributeError if _tu is not set. Theoretically, the API should ensure that
>>>>>>>>>>> a TU is defined on all Cursor instances, so maybe an AttributeError would be
>>>>>>>>>>> acceptable.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> I'd prefer that - to me defensive programming means to:
>>>>>>>>>> 1. never crash for invalid user input
>>>>>>>>>> 2. crash hard if there's an actual bug in the library
>>>>>>>>>>
>>>>>>>>>> If the cursor should never be generated by a user (-> the contract of the
>>>>>>>>>> library is to not break that), I think we should use _tu and have the
>>>>>>>>>> AttributeError thrown if it's not there. Or I'm also fine with putting in an
>>>>>>>>>> assert that checks that the attribute is there and delivers a nicer error in
>>>>>>>>>> case it's missing.
>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> +        # Store a reference to the TU in the Python object so it won't
>>>>>>>>>>>> get GC'd
>>>>>>>>>>>> +        # before the Cursor.
>>>>>>>>>>>> +        tu = None
>>>>>>>>>>>> +        for arg in args:
>>>>>>>>>>>> +            if isinstance(arg, TranslationUnit):
>>>>>>>>>>>> +                tu = arg
>>>>>>>>>>>> +                break
>>>>>>>>>>>> +
>>>>>>>>>>>> +            if hasattr(arg, 'translation_unit'):
>>>>>>>>>>>> +                tu = arg.translation_unit
>>>>>>>>>>>> +                break
>>>>>>>>>>>> +
>>>>>>>>>>>> +        assert tu is not None
>>>>>>>>>>>> +
>>>>>>>>>>>> +        res._tu = tu
>>>>>>>>>>>>
>>>>>>>>>>>> That seems - odd. I can't find docs what from_result is supposed to do,
>>>>>>>>>>>> or what "args" are provided. But having to search through them for a TU
>>>>>>>>>>>> seems wrong - shouldn't they all have a TU?
>>>>>>>>>>>
>>>>>>>>>>> from_result is the errcheck function for the ctypes functions that return
>>>>>>>>>>> a Cursor. The 3rd argument to an errcheck function are the original
>>>>>>>>>>> arguments passed into the called function.  For many of the functions, the
>>>>>>>>>>> original argument is a Cursor. However, clang_getTypeDeclaration takes a
>>>>>>>>>>> Type and clang_getTranslationUnitCursor receives a TranslationUnit.
>>>>>>>>>>>
>>>>>>>>>>> It is true that all of the functions today receive a single argument, so
>>>>>>>>>>> the iteration isn't required. However, that may change with the addition of
>>>>>>>>>>> new APIs in the future (this is how I coded it in my nearly feature-complete
>>>>>>>>>>> branch, so I'm guessing it is actually required from a future patch).
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> The question is: don't we just want to assert that all of those have a TU
>>>>>>>>>> (or are a TU)?
>>>>>>>>>
>>>>>>>>> ^ this question is still open I think...
>>>>>>>>
>>>>>>>> I don't think it is. Both modified from_result functions now assert if
>>>>>>>> a TU could not be found.
>>>>>>>>
>>>>>>>> Or, are you asking for something more? In the future, every live
>>>>>>>> object will likely hold a reference to a TU. We aren't there quite
>>>>>>>> yet. But, this patch is a step in the right direction.
>>>>>>>
>>>>>>> Ah, ok, that was my question - why not every object has a TU and we
>>>>>>> can assert *for every object* that it has a TU. If that's a follow-up
>>>>>>> step, then LGTM.
>>>>>>
>>>>>> Well, in the case of from_result (or any errcheck function), not every
>>>>>> argument to the original function may have a TU attached. e.g. integer
>>>>>> arguments.
>>>>>
>>>>> So that means we cannot even be sure that we'll always find a TU?
>>>>> Still trying to understand from_result better.
>>>>
>>>> Don't feel bad. I was completely lost the first time I encountered
>>>> these as well.
>>>>
>>>> The various from_result functions are registered as 'errcheck'
>>>> functions for ctypes-registered functions. The function referenced by
>>>> the 'errcheck' parameter of a ctypes function (if defined) is called
>>>> by ctypes when the C function call has returned but before the result
>>>> is passed to the original caller. In addition, the return value from
>>>> the errcheck function is what is actually handed to the original
>>>> caller.
>>>>
>>>> The errcheck function receives 3 arguments: the result from the C
>>>> function call, a reference to the called function, and a tuple of
>>>> arguments passed into the C function.
>>>>
>>>> Think of errcheck functions as hooks that allow you to perform
>>>> additional validation and to modify the result of a low-level C
>>>> function call. It is perfectly valid for them to raise an exception if
>>>> the result is not what's expected, convert the C return value to
>>>> something else (by not returning the first argument as-is), or perform
>>>> modifications to the value returned from the C function call. If you
>>>> wanted, you could even use these to aggregate "out" arguments from
>>>> function calls into the return value.
>>>>
>>>> For example, for the function clang_getArgType(), the ctype function
>>>> reference has:
>>>>
>>>> .argtypes = [Type, c_uint]
>>>> .restype = Type
>>>> .errcheck = Type.from_result
>>>>
>>>> When you call clang_getArgType(), ctypes will do some magic to convert
>>>> the supplied arguments to the required types. When the C function has
>>>> returned, it will invoke Type.from_result() with a Type instance, the
>>>> ctype function instance, and a 2-tuple of (Type, c_uint).
>>>>
>>>> Honestly, we should probably have different errcheck functions for
>>>> each function prototype. Then, we could make stronger assertions on
>>>> the arguments. We cheat a little bit and use the same errcheck
>>>> functions on each class. It does work.
>>>>
>>>> Does that help explain things?
>>>
>>> Yes, thanks. I'm not sure the answer to my question is hidden in there:
>>> If the arguments don't have a TU, but we need the result to have a TU,
>>> we're basically screwed, right?
>>
>> Pretty much.
>>
>>> So wouldn't it make sense to make
>>> everything that depends on the TU "living" to be created by calls into
>>> the TU? I'm probably still missing something here... ;)
>>
>> I'm not sure if we need to go that far, as that would make for a
>> somewhat complicated API. If we copy references to the TU at C
>> function call time on newly-created Python objects, we should be fine.
>> We just need to ensure we don't miss any references.
>
> I'm still not entirely sure I understand this, but let me know whether
> this is true:
> libclang will never return values that depend on the TU being alive if
> it didn't have an argument that depends on the TU being alive or is
> the TU itself.
> If so, we're fine.

Correct. The TU instance will be buried in an opaque struct somewhere
(behind libclang's implementation) or will be the TU itself. Of
course, if the TU isn't alive, libclang will likely segfault (and not
return any value).