[cfe-dev] cindex.py: can I add TokenGroup.get_rtokens()

Wed Dec 10 13:10:51 PST 2014

Hi there,

I've only just starting using clang/cindex.py as a parser and I want to be
able to get doxygen style comments associated with #defines in C source
code. The cursor.raw_comment doesn't seem to be populated for preprocessor
defines so in order to fetch doxygen style comments I've done this:

for cursor in cursor.get_children()
    if(cursor.kind == clang.cindex.CursorKind.MACRO_DEFINITION):

        # Get position of the preprocessor define
        start = cursor.extent.start.offset
        end   = cursor.extent.end.offset+1

        # Get the extent from the source file. Not sure why
        # I need start-2 instead of start.
        extent = tu.get_extent(path_of_source, (0, start-2)

        # Get the list of tokens before the pre-processor define
        tokens = clang.cindex.TokenGroup.get_tokens(tu, extent)

        # Reverse the generator so that we can walk backwards through
        # the token list to extract comments before the preprocessor
        # definition. This is painfully slow.
        tokens = reversed(list(tokens))

        comment = None

        for t in tokens:
            if(t.spelling in ('#', 'define')):
                continue
            elif(t.kind == clang.cindex.TokenKind.COMMENT):
                comment = t.spelling
            break

        if(comment != None and comment.startswith('/**')):
            # process comment for this preprocessor statement

I'd like to get the token list in reverse order so that I can walk
backwards looking for comments associated with the #define. That has a lot
of overhead in Python if the token list is large so I've added a method
called get_rtokens() to the TokenGroup class which returns the tokens in
the reverse order:

class TokenGroup(object):
    ...
    @staticmethod
    def *get_rtokens*(tu, extent):
        """Helper method to return all tokens in an extent in reverse order
           to avoid the expense of having to convert the returned generator
           to a list and then calling reverse() on it.
        """
        tokens_memory = POINTER(Token)()
        tokens_count = c_uint()

        conf.lib.clang_tokenize(tu, extent, byref(tokens_memory),
                byref(tokens_count))

        count = int(tokens_count.value)

        # If we get no tokens, no memory was allocated. Be sure not to
return
        # anything and potentially call a destructor on nothing.
        if count < 1:
            return

        tokens_array = cast(tokens_memory, POINTER(Token * count)).contents

        token_group = TokenGroup(tu, tokens_memory, tokens_count)

        for i in xrange(count-1, 0, -1):
            token = Token()
            token.int_data = tokens_array[i].int_data
            token.ptr_data = tokens_array[i].ptr_data
            token._tu = tu
            token._group = token_group

            yield token

I'm wondering if anyone has accomplished this in a better way or whether
the the get_rtokens() method would be useful to other people. What I really
want to do is relate a cursor to a token and then navigate forward and
backwards from that token but I can't see how to do that.

Thanks,

Brad Elliott
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20141210/b7094cc3/attachment.html>