<table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Issue</th>
<td>
<a href=https://github.com/llvm/llvm-project/issues/76472>76472</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>
[libclang] `annotateTokens()` produces different cursor than `visitChildren()`
</td>
</tr>
<tr>
<th>Labels</th>
<td>
new issue
</td>
</tr>
<tr>
<th>Assignees</th>
<td>
</td>
</tr>
<tr>
<th>Reporter</th>
<td>
jimmy-zx
</td>
</tr>
</table>
<pre>
While testing the `annotateTokens()` function (used by `Token.cursor` of the python binding), I found that for some cursor, the (only) token that belongs to that cursor does not maps back to the cursor itself.
For example, on the following code,
```c
struct a {
int b;
};
int func(struct a *ptr) {
int r = ptr->b;
return r;
}
```
I made a script that selects the `DeclRefExpr` that refers `ptr` in the statement `int r = ptr->b`, and check if the only token that belongs to the expression, `ptr`'s cursor maps to the cursor.
```python
from clang.cindex import TranslationUnit, Cursor, CursorKind
def main():
tu = TranslationUnit.from_source("./demo.c")
root: Cursor = tu.cursor
node = None
for node in root.walk_preorder():
if node.kind == CursorKind.DECL_REF_EXPR and node.spelling == "ptr":
break
token = None
for token in node.get_tokens():
break
print(token.cursor == node)
print(token.cursor._kind_id, node._kind_id)
print(token.cursor.xdata, node.xdata)
print(*token.cursor.data)
print(*node.data)
if __name__ == '__main__':
main()
```
The result of the above script is
```
False
101 101
0 0
140162768666120 140162768666224 140162768050240
None 140162768666224 140162768050240
```
The cursors `node` and `token.cursor` should be the same, and they indeed share the same spelling and extent. However, `libclang` consider them as different cursors.
The equality of cursor is provided by `clang_equalCursors()`, and the only difference between these two cursors are `data[0]`.
https://github.com/llvm/llvm-project/blob/1c1eaf75f5f2efd72ba813b29b3d7b556d61b70b/clang/tools/libclang/CIndex.cpp#L6289-L6303
I suspect that the creation for `DeclRefExpr` cursors are in `MakeCXCursor()`, and `data[0]` probably means the parent cursor.
https://github.com/llvm/llvm-project/blob/1c1eaf75f5f2efd72ba813b29b3d7b556d61b70b/clang/tools/libclang/CXCursor.cpp#L570-L583
https://github.com/llvm/llvm-project/blob/1c1eaf75f5f2efd72ba813b29b3d7b556d61b70b/clang/tools/libclang/CXCursor.cpp#L876-L878
There might be an issue where the `data[0]` (parent) field is not being set properly, or `clang_equalCursors()` should ignore `data[0]` when comparing statements?
</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJzMV0tv47oV_jX05iACRVmSvfAiY8fooGlRDKbo3QmUeGTzhiJVkkri_vqClORXMoO5uwkCWSTP-zsPijsnDxpxQ_IvJN8t-OCPxm7-lF13evjf-6I24rT5z1EqBI_OS30Af0QgBeVaG889fjcvqB1hK8LWpKDQDrrx0mggbDU4FFCfAnkkS5rBOmMDmWmjoP7kj0ZDLbWQ-hBEsC18hdYMWoA_cg-tseBMhzDxsu1oAVsZrU6ErcEH0SNxjcrogwNvxvXIA8KgA208dLx3UPPmZaSYhYL0DlWbAKE7Qh_H595YwHfe9QqDUqMjQ2uUMm8hDo0R4WBiKej434xr5-3QeOBAyi_jDkjtoSbZtCTl7vIen-E8BI-w1YWbPfbeBicvcgCiKAsk20Hv7QPJni5ywaIfrAZ7o-nWxmutX6HjAoGDa6zs_Rg2hwob72aod9iob9g-vfcRukhisUXrwmkwsKAgx_i4kBMdah-OPrGzoCGYXAtojti8gBzzIGD5QyAR8L236Jw0OnCflRJWuhnCCO0Nqsm1n2fnx4wbN1trOmgU14ekkVrgO8iuN9bDd8u1Uzzk8b-19EHp9px-49vfpRY3CuJTYAsdl3qqh-zxApofYiTuRCfBhsqZwTYYmVhC2F5gZ5KGMBaEnCVYYzzJHicDojQ_zDV1ZUSg1UZgpPin0XjZDtUUj6SO4pI3rl6q3qKxAu291bKNxMmL1CIIC_Iu3ie7p-1z9e1pXz398a9vEdNI7XpUKlTIxEEYi0nMbsIx_9UW-cu99WMmfG7-eCb1qOyAvvLXLehex6fyeyu1J2zlr7rSbK6Odb3-FZakCpGppAhpEc25bMy4fcr3LrjnZ6Zptf6ojLDHG75bwiuqKOfm9Ka1tFBVmndYVRdQyqoKmVpVhJWXqF0l74-bxvcjgkU3KD_3cV6bV5x7iHSf8u65chOWKU0hpem4oDARpEuaFqwsVkVRpIzC9Zqx5WVNc8qWE1NIkF-j_KErY3BjM4vgFzQmMymov5ta7mgGJaDGsdfxDudu5o94gtBDUIA7cnuhgHM9BDp896h9An8zb_iKdmpnStaxDQUdjdFOCrRBQAfcgZBtiza01MnQ5N4B_O_AlfSngMY80Rz01rxKcZ7AUUEVSccavgztKyfGTjyrbBBq9G8Y2zI6BP9mzuEKTpKCxqzLv1CS70hBb2w7et-7kFtsT9j-IP1xqJPGdITtlXqdfx56a_7ExhO2r5WpCdunTYq8LfM2bxm2omQ1X6VZzdZ1Jso6zwtRpHVJA-kYNrb3xigXBM6RZPvt19DSk6bvCcueC7ZaPzwXGc1ux58bXI_NNPji8LAYu3NsNh-n37X3UgeCf_AX3P4xz4e7kH4IUICl5rU6QYdcjzO251fw_k4RnLyaY5iX9OE5X2W_o22rsnh4XpWru9qwCJ08HMONArgG6dyA8Bb3p9vNHTyErUY4wq2rlahEqKVweawxFLFDHyDs0Ybb5xbGJPlJcc1NQx60-axigjUaGtP13EYF8w3KkWw_-rEQm0ysszVf4CYtaVamebrKF8dNTYt6ydO24CLL07QVLadZtsQ6LdJSZPlCbhhlWcpYyVia0Twp6XLNCyqacpnTIl-SJcWOS5UErBJjD4sYok1ZLEu2ULxG5eLHAWMa38b4hVGe7xZ2E_Gth4MjS6qk8-4ixUuv4lfFGbJ897PPht4aMTT4sduFwoxl9iqd9NujVMKiPjMuBqs2fzkToxshoaKb_w8AAP__Bof4xA">