[clang] #101784 part 1: introduce ctyped in an independent manner (PR #101941)
Vlad Serebrennikov via cfe-commits
cfe-commits at lists.llvm.org
Wed Aug 7 11:32:33 PDT 2024
Endilll wrote:
I've been staring at `cindex.py` for the better part of yesterday and today. I'll try to describe the status quo as I understand it first, to make sure all three of us are on the same page, then I'll describe the direction I think we should take.
Status quo
------------
1. I believe we're looking at three principal levels here: (1) `libclang` shared library, which exports functions; (2) function signatures of those exported functions described via `ctypes` (`functionList`), and (3) user-facing cindex API implemented in terms of said functions.
2. `libclang` itself is considered a bedrock for `cindex.py`. Not much to discuss here.
3. `ctypes` function signatures are required to describe how to call exported functions. Typical entry in `functionList` looks like the following: `("clang_isConstQualifiedType", [Type], bool)`, where function name is followed by a list of parameter types, which in turn is followed by return type. This information is used to fill in `argtypes` and `restype` attributes of the imported function object respectively. (@DeinAlptraum so `functionList` is there to stay one way or another.)
4. cindex API is typically a rather thin layer on top of `ctypes` function signatures, which provides convenient classes and methods.
5. There are two quirks to how values are returned from `libclang` into Python realm. The first one is about structural types. When exported function returns such a type, on `ctypes` side we need to declare a Python class that derives from `ctypes.Structure`, and fill in `_fields_` member describing the layout. Then this type can be used to initialize `restype` attribute of imported function object. When this function is called, under the hood `ctypes` constructs the Python object according to layout and returns it.
```python
class SourceLocation(Structure):
_fields_ = [("ptr_data", c_void_p * 2), ("int_data", c_uint)]
```
6. The second quirk is about conversions of returned objects. If we need to return something else than what is specified in `restype`, e.g. a string, we need to perform a conversion. Conveniently, imported function objects has `errcheck` attribute that accepts a function. It is expected to perform checks (and raise an exception if they don't pass), but more importantly for us, what this function returns is what the call to imported function object returns, so it can be used as a conversion function as well (such usage is supported by the official documentation). This is what the 4th member of entries in `functionList` used for: `("clang_getFileName", [File], _CXString, _CXString.from_result)`.
Recent developments
------------------------
7. I believe our type annotation efforts up until now (mostly @DeinAlptraum patches) have been focused on cindex API.
8. One of the problems we've encountered while doing that is that imported function we rely on are not typed. This is not an issue when passing arguments (typed -> untyped is fine), but is a problem for the values we return (untyped -> typed make type checkers complain). This is the reason why we have so many `# type: ignore [no-any-return]` comments on `return` statements in `cindex.py`.
9. It seems that `ctypes` does not care about type hints, despite the fact that function signatures are specified. This is quite unfortunate, and leads to lack of type annotations for the layer of imported functions. I wasn't able to find any widely deployed solution for this.
10. It seems that the approach @TsXor is pursuing is to fill in the gap between `ctypes` and type hinting by introducing machinery that generates type hints for imported functions from their signatures (remember, we need to specify them anyway for `ctypes` to understand how to call them). This is a cleaner approach to resolve the issue with untyped returns than silencing type checkers with `# type: ignore [no-any-return]`.
Proposed direction
---------------------
11. I think adding type hints has two primary goals: enable users to do type checking, and provide more information to autocompletion engines. Specifically, cindex API is what needs to be annotated for users to get those benefits. I'm not sure why anyone would use "raw" imported function when there's much more convenient API in the same module.
12. I think our "current" approach supported by @DeinAlptraum work allows us to achieve both those primary goals.
13. I'm sympathetic to the "new" approach @TsXor is taking, but when I look at it for the standpoint of the primary goals, I don't see significant improvements that would justify the significant complexity it has over "current" approach.
14. `ctyped` is a useful work, but not for Clang, because we don't have enough maintainers experienced in this particular corner of Python. The problem it solves should be solved upstream, in `ctypes` or `typing` modules of the Python standard library. Once there is an upstream solution, we can reconsider the implementation approach of our type hints.
15. We shouldn't rely `errcheck` when we can help it. I find this to be a bit too much magic when one looks at the implementation of cindex API trying to understand what happens between `libclang` and cindex API.
16. Relying less on `errcheck` should help with contentious `# type: ignore [no-any-return]` comments sprinkled all over the place.
```python
# bad
def get_template_argument_kind(self, num: int) -> TemplateArgumentKind:
return conf.lib.clang_Cursor_getTemplateArgumentKind(self, num) # type: ignore [no-any-return]
# good
def get_template_argument_kind(self, num: int) -> TemplateArgumentKind:
return TemplateArgumentKind.from_id(conf.lib.clang_Cursor_getTemplateArgumentKind(self, num))
```
17. Hopefully the similar transformation can help us when structural types are returned. In this case, it's not `errcheck` that works, but `_fields_` and internal `ctypes` machinery. I'm a bit surprised that `ctypes` doesn't do as much for us, because this seems rather trivial to implement.
```python
# bad
def start(self) -> SourceLocation:
return conf.lib.clang_getRangeStart(self) # type: ignore [no-any-return]
# good
def start(self) -> SourceLocation:
return SourceLocation(conf.lib.clang_getRangeStart(self))
```
18. In #101784 I saw more fixes than `ctyped` and things around that. You're encouraged to submit them separately.
I hope this long write-up help us to get on the same page, and find a way forward.
CC @AaronBallman hopefully this is accessible enough for you to wrap your head around if you ever wanted to know how our Python binding work. Maybe we can turn this into documentation.
https://github.com/llvm/llvm-project/pull/101941
More information about the cfe-commits
mailing list