<table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Issue</th>
<td>
<a href=https://github.com/llvm/llvm-project/issues/68340>68340</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>
Clang Python Bindings cursor.get_tokens() fails when cursor's extent contains macro expansions
</td>
</tr>
<tr>
<th>Labels</th>
<td>
clang
</td>
</tr>
<tr>
<th>Assignees</th>
<td>
</td>
</tr>
<tr>
<th>Reporter</th>
<td>
jnikula
</td>
</tr>
</table>
<pre>
The root cause may be the same as for #43451, but reporting separately for Clang Python Bindings.
If a cursor's extent contains macro expansions, `cursor.get_tokens()` may yield nothing or bogus tokens.
As a workaround, re-creating the cursor's extent and retrieving the tokens from the translation unit appears to always work as expected. Comparing the re-created extent and the cursor's original extent with `==` shows they're different, although their `__repr__` is exactly the same. Something goes wrong under the hood.
Here's an example and reproducer, with `_cursor_get_tokens()` as the workaround, let's call the file `test.py`:
```
#!/usr/bin/env python3
import argparse
from clang.cindex import *
def _cursor_get_tokens(cursor):
tu = cursor.translation_unit
# duplicate the cursor's extent
start = cursor.extent.start
start = SourceLocation.from_position(tu, start.file, start.line, start.column)
end = cursor.extent.end
end = SourceLocation.from_position(tu, end.file, end.line, end.column)
extent = SourceRange.from_locations(start, end)
yield from tu.get_tokens(extent=extent)
def main():
parser = argparse.ArgumentParser()
parser.add_argument('filename', action='store')
args = parser.parse_args()
index = Index.create()
tu = index.parse(args.filename, options=TranslationUnit.PARSE_DETAILED_PROCESSING_RECORD)
for cursor in tu.cursor.walk_preorder():
# hack for clang_Location_isFromMainFile
if str(cursor.extent.start.file) != str(tu.cursor.extent.start.file):
continue
print(f'{cursor.kind} {cursor.spelling}')
tokens = [t.spelling for t in cursor.get_tokens()]
print(f'\t{tokens}')
tokens = [t.spelling for t in _cursor_get_tokens(cursor)]
print(f'\t{tokens}')
if __name__ == '__main__':
main()
```
Reproducer C source, let's call it `test.c`:
```
#include <stdbool.h>
#define LONG long
int i;
bool b;
LONG l;
struct s {
int i;
bool b;
LONG l;
};
```
Partial results for running `./test.py test.c`:
```
CursorKind.VAR_DECL i
['int', 'i']
['int', 'i']
CursorKind.VAR_DECL b
[]
['bool', 'b']
CursorKind.VAR_DECL l
['long', 'int', 'i', ';', 'bool', 'b', ';', 'LONG', 'l']
['LONG', 'l']
CursorKind.STRUCT_DECL s
['struct', 's', '{', 'int', 'i', ';', 'bool', 'b', ';', 'LONG', 'l', ';', '}']
['struct', 's', '{', 'int', 'i', ';', 'bool', 'b', ';', 'LONG', 'l', ';', '}']
CursorKind.FIELD_DECL i
['int', 'i']
['int', 'i']
CursorKind.FIELD_DECL b
[]
['bool', 'b']
CursorKind.FIELD_DECL l
['long', 'int', 'i', ';', 'bool', 'b', ';', 'LONG', 'l', ';', 'struct', 's', '{', 'int', 'i', ';', 'bool', 'b', ';', 'LONG', 'l']
['LONG', 'l']
```
Note how the results for `cursor.get_tokens()` vary depending on whether the macro is local or external. Basically the method is unreliable whenever there are macros involved.
</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJzUWNtu4zgS_Rr6pdCCTMW3Bz8kVjzb2Gx3kGT2VaClssQJTQoklbT_flGUbMmOpi_A3iYIbFGqOnWq6pAULZyTpUZcs9kdm6UT0fjK2PUfWr42Skx2pjiuXyoEa4yHXDQO4SCOsEPwFYITBwThYG8sMJ7cJDezKeMb2DUeLNbGeqlLcFgLKzyqYzDcKKFLeDz6ymi4k7qQunQRi1MW37afn_cgIG-sM5bxhQP85lF7yI32QmoHB5FbA_itFtpJox2FZPO49YhK9Jk3r0j3l4yv2DwOnI8SVQHa-IpIGQs7UzYOWtOL-LcOBLwb-yqsaXRB8BY_5RZFyIcy_8hO6AIseivx7WTUQsPemkM7tkI7Jbw0GhotPYi6RmGJAwj1Lo4uRKWK4rcac49FBBtzqIU9QZ54YDGMe8XIWFlKLdTJ5F36igrEkpT-5zG4yrw7cjsyvrAIhdzv0aL2lKtQvjJNWdFzackxyyzWNsvIVRI5kXt1PEsggmdzwLaupUEH79boEhpdoA1GlTHFRYX_hhYDVaEJ7VAr7ApYW1M0OVoicuKdtbllY50VIY2rbin0AT0XSoXHe6mQkDw6H9XHUIvbISG60_63Q54wPmV82zjL-HYnNeNb1G9QB9kmQ1d5IKGDsGUtrMPho9D5nPQe5VIX-A06Y8Yvohe4h9EkT01dnekCAPgGWJJ2HY8GqspIVUNgsmY8gaKplcyFx3Hx9sbOC6LXo7cGUbg_ZvZsGpvjg8kDgYgyzmrjJI0YX_qG-hHMI2pCP1JSD0a5Uc1BU6JX7FEXI3RQFx9NfooK6uJMhK5PNOh6nEQ3ifoIT0KX2MKrLhj1qi1RCzWSSLv8tGtBc7lIdU1I0u7i0pnEcRCkwOW1EILgbKB2kl90a8vmgNo_hmed05VHJIoiE51hMFlQTbQ40KwMa0AeqpakpBJvbHt_ACRs6ULgDjF8EaYbhuztW_mTw2e6itpVbNy203dwaYEZXxJ01LPcgKnbyifpSz8DftfSR4-3T8_3WXr_cvv54T7NHp--bu6fnz9_-S17ut98fUpHQtLG1EoMpKYGdXp7F-o1qy0aW5yLOezAaYJVIn9tQWi2ZycRZtJtrTn8Q0i9JdFd-Mk9OG_Ps_xiqnUaXQEtQ0naGfa8xmw_EKM_2jSlbi6WJaitDH3fU1cXdx3mq9QFW6TQ33E1KiV1yRbpsP-XUbpdjliy2Z0_O4VyeCrnn23Ms_QS6YLWbOPZ4q6zH43_48A_WFPP8X8tsNxDlpEKswzaLRUYX2QZTdIsI_tzIwbzdnSbCZ9P500PNuDCEvNhE5P-vH3lP7N7SZ2rpkBgycb5YmeMiiqW3F848aTAvdQID1-__AbK6PIiS-1BsuSuHREC7M7D1uM8bD-dt03uwZGATvdXlzAsXl0hsXh1DbZI--uxej0K66VQYNE1yrdvn7bRmlrP5nHE-Lbb5-EnC7YJkvi71EX0z9unLL3fPIA886NXY74ICglrIw2CUNKfNhkLsLvw_gBGderRdj9CU1fuoZ09mQ_M2ksqdB_jY8QRK-pWP1KjhfiOzYD788vT75uXlr67Qmi11GO4AZvF3X8rsRGrbkX4kPJfhfCg_tvP9w_pf1bsgxD_FrkP8P6Xgh-x-v_o_6_MxO-ssF-Mp0Pbe3fk7JfZ75-y34Q9QoE1hjM9GA3vFfqqOwS2p3bpgF6aFR3A6SXGaqEiuBNO0j7XninpLGkKMm20RSXFTiFBaXxrsSyCsB2iA6nfjHrDyxPmpFgnxSpZiQmup_PVfMZXi-VyUq2Xq_3qZjebLgWfzePdfp6vxAync34zS-IViolc85gn0zieTZfTOV9GfL9LprN4xosi2Yv5lN3EeBBSRUq9HSJjy4l0rsH1fJncxBMldqhc-D2F8_A2yDhns3Ri12T_adeUjt3ESjrvegQvvcL16E8jf_b6BHshlQt1-ZUfSyaNVevK-9rRnsi3jG9L6atmF-XmwPiWKHVfn2pr_kBS9TZk6BjfhiT_FQAA__9LuEZY">