[cfe-dev] Problems and more problems with the python bindings

Joxean Koret joxeankoret at yahoo.es
Mon Jan 9 12:15:30 PST 2012


I'm currently writing a tool to do static analysis using the Python
bindings and one of the first tasks is to write a simple C-to-C

For this task, the AST object cindex generates should return more
information than what it's actually returning (like for
BINARY_OPERATOR/UNARY_OPERATOR the operator in question, for literals
the corresponding literal, etc...).

The hack I'm using for retrieving this information not exposed by the
python bindings is the following patches provided by Manuel Holtgrewe
[1]. It exposes in the python bindings the libclang tokenization APIs
and does the following:

1) Retrieve the location of the expression (the start and the end
2) Read from the provided file the corresponding chunk of C code.
3) Tokenize this string and extract from it the interesting part.

However, this approach doesn't work for many reasons. Some of them:

1) The start and end offset of the expression returned by the python
bindings may be wrong. For example, the following expression:

  if ( 0 == 1 && 2 == 3 )
    // stuff

...will report 3 BINARY_OPERATOR expressions (which is correct) but with
wrong start and end offsets:

CursorKind.BINARY_OPERATOR <SourceLocation file '../tests/test7.c', line
3, column 10>
CursorKind.BINARY_OPERATOR <SourceLocation file '../tests/test7.c', line
3, column 10>
CursorKind.BINARY_OPERATOR <SourceLocation file '../tests/test7.c', line
3, column 20>

The 1st and 2nd BINARY_OPERATOR returned both have the same start and
end offset (the complete parenthesis expression), which is wrong.

2) It, simply, doesn't work with macros. If I expect an INTEGER_LITERAL
and I search in the tokens returned by the clang tokenization API I'll
seek for a TokenKind.LITERAL expression. However, if it isn't a literal
but a macro (like __LINE__, for example), as I'm reading the raw file
without preprocessing (as the preprocessed buffer, if I'm not wrong, is
not available) I'm not reading the correct literal.

Well, I think my mail is very long already so, here goes my questions:

1) Is there an easy way to retrieve this information (the relevant
operator for BINARY_OPERATOR, UNARY_OPERATOR, *_LITERAL, etc...) using
the current python bindings?

2) If not (as seems to be), can somebody tell me where to look in the
CLang's source code to expose this information?

PS: Sorry for the long e-mail.

Thanks in advance,
Joxean Koret

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 198 bytes
Desc: This is a digitally signed message part
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20120109/493f815c/attachment.sig>

More information about the cfe-dev mailing list